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The interest and activity in the field of the measurement of 
human characteristics have never been greater than today. It is 
with this thought that the editors present the first issue of Edu- 
cational and Psychological Measurement. Educational institu- 
tions, government, and industry are all giving increasing atten- vs 
| tion to methods of evaluation aimed at determining the status and , 
promise of the individual. Improved methods in measurement 
are being developed and significant research is being done in 
many fields. 


In spite of this rising interest, measurement is still a step- 
child. The contributions of measurement theory and practice have 
found expression in the publications devoted primarily to other 
fields. Nowhere has there been a common meeting ground for 
the exchange of ideas from area to area except for the more 
technically inclined. 


Yet there are measurement problems of practical and imme- 
diate concern which are common to many fields. The problem 
of estimating future success is common, for example, to the tasks 
of helping young people choose appropriate vocations, of select- 
ing employees, of admitting students to educational institutions 
and assigning draftees to jobs in the Army. 


The limited interchange of ideas and techniques in measure- 
ment probably can be explained by the fact that there has been 
no single journal which could be counted upon to report current 
developments and to serve as a forum for the discussion of prob- 











lems. It is our purpose to remedy this situation. The pages of 
Educational and Psychological Measurement will be open to con- 
tributions from all fields in which techniques of human measure- 
ment are used. Each issue of the journal will have departments 
devoted to news and abstracts of recent literature. Future issues 
will also carry a section on new tests. 


It is hoped that the articles in the journal will not only be 
of interest to readers in the specific areas from which the articles 
come, but that they will be suggestive of improved procedures 
elsewhere. 


Washington, D. C. G. F. K. 
December 23, 1940. 
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THE EVALUATION OF VOCATIONAL AND 
EDUCATIONAL COUNSELING: 
A CRITIQUE OF THE METHODOLOGY OF 
EXPERIMENTS* 


E. G. WILLIAMSON AND E. S. BORDIN 
University of Minnesota 


With increasing attempts to systematize the concepts of coun- 
seling, to describe its techniques, and to delineate its objectives, the 
need for evaluative studies has become more insistent. Descriptions 
of programs of vocational and educational counseling usually close 
with a summary statement that further improvement in this field 
is dependent upon evaluative studies (40: chap. XXVII, 42, 43: 
chap. IX, 44+). In other words, currently used techniques of coun- 
seling must be subjected to scrutiny and evaluation in order that 
more effective ones may be developed. Thus a fertile field for ex- 
perimentation may be found in this phase of student personnel work. 


Restricting Conditions 


A review of the peculiar conditions of this field of applied psy- 
chology is in order and should precede attempts to experiment. This 
paper will attempt to summarize, in a critical and systematic man- 
ner, the assumptions, criteria, methods of measuring outcomes, and 
possible experimental designs involved in the evaluation of educa- 
tional and vocational counseling. The treatment of personality, 
social, family and other types of students’ problems will be con- 
sidered only in relationship to educational and vocational adjust- 
ment. The evaluation of these other types of counseling—usually 
called personality counseling—should be the subject of another 
paper. 

When we speak of counseling, we refer to individualized efforts 
to help students discover vocational assets and disabilities and to 
plan an appropriate training program. The making of such an in- 
ventory of potentialities must be preceded by the collection and use 
of evidence of abilities, interests and motivations. The techniques 
involved in collecting, refining and using evidence have been de- 
scribed elsewhere (40: chap. III). 


*The report of a statistical evaluation of clinical counseling by the 
same authors will appear in the next issue of this journal. 
See pages 22-24 for references. 
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For purposes of evaluation experiments, it is necessary also to 
agree as to what counseling is not, to define it negatively. For 
example, we cannot accept the assumption that testing alone or that 
statistical prediction is counseling. Such would seem to be the 
assumption of Thorndike’s 1934 study (35) as an evaluation of 
counseling techniques. On the other hand, attempts to define coun- 
seling as self-analysis (by students) or as diagnosis based alone 
upon impressions, student hopes and interview data (by counselors) 
are equally unacceptable. Counseling must be based upon an under- 
standing of the student; but the counselor does more than make a 
diagnosis or prediction. Counseling is the process of helping the 
student to plan, and to utilize his assets. 

Progress toward adequate evaluation of counseling has been 
impeded by two types of attitudes held by some personnel workers. 
Some counselors evaluate by means of arm-chair methods. That is, 
the effectiveness and general worth of counseling is held to be self- 
evident. These persons reason that the general methodology of 
guidance must be effective because it appears to be an appropriate 
method of dealing with serious and widespread maladjustment 
among youth. Other personnel workers appear to believe that coun- 
seling cannot be evaluated. They maintain that the counseling 
process is so personal and individual that any attempt by the coun- 
selor to study it will impair his efficiency as a counselor and will 
create an artificial situation which will not even remotely resemble 
the real counseling relationship. 

On the other hand, those who believe that counseling can and 
should be evaluated have taken one of three approaches. First, 
there is the approach which clings to traditional statistical meth- 
odology in utilizing only those criteria that are objectively quanti- 
fiable. This approach is based upon the premise that a straight- 
forward statistical analysis of such data as grades, years in college, 
number of jobs held or wages earned, are sufficient criteria for 
evaluation experiments. Second is the approach which utilizes 
non-statistical case study methods of evaluation. The third ap- 
proach attempts to avoid the objections to the other two methods 
by using various objective and systematically derived criteria 
which are combined by means of impartial judgmental treatment 
in contrast with statistical summations. 

The assumptions underlying criteria should be made explicit. 
Implicit assumptions have been the source of error in planning and 
interpreting some evaluation studies. For example, prediction has 
frequently been treated as though it were the beginning and end of 
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guidance. Here again the interpretations of Thorndike’s study (35) 
serve as an example, although others might also be used. The con- 
clusion, drawn by many from Thorndike’s study, that counseling 
was low in effectiveness, would not be objectionable if the interpre- 
ters had indicated that by guidance they mean statistical prediction 
of fragmented criteria. In speaking of prediction of fragmented cri- 
teria we refer to the fact that many research workers lose sight of 
the possibility that one datum often has different meaning and sig- 
nificance for different students. If such is the case, and we have 
every reason to believe that it is, then any attempt to use these bits 
of information either separately or in a rigid arithmetic combina- 
tion may obscure the actual outcomes of counseling. 
Ee 

The supposition that specific objectives, such as an increase in 
academic achievement, will necessarily be common to all the cases 
in an experimental population must also be examined. If we cannot 
accept the supposition, then we must consider the possibility that 
the use of what is at best a partially applicable criterion is likely to 
reveal only slight differences, if any at all. For example, a low 
aptitude student who had been successfully counseled into with- 
drawing from college cannot be included in an experiment designed 
to reveal the effectiveness of counseling in increasing grades. 


There are two other considerations of this type that the careful 
research worker must consider in planning an effective evaluative 
experiment. First he must realize that in order to evaluate a pro- 
gram of action, it must be carried out. The student must do some- 


_ thing following counseling in ordez to make evaluation possible. 


‘A physician might just as well attempt to discover the effective- 
ness of his medicine when his patient has taken it home and placed 
it unused in his medicine cabinet. Setondly, a counselor may 
change a student’s attitudes, but these must be revealed in observ- 
able or measurable behavior or they cannot be evaluated. Any out- 
come that is beyond the scope of some means of dependable obser- 
vation is one that cannot be dealt with and therefore must be re- 
jected by those who require more than blind faith. 


The question of the optimum time interval for evaluation is one 
that needs further investigation before much progress can be made 
in evaluation experiments. It is possible that the optimum time 
interval will vary for each individual in any experimental group; 
or perhaps the longer the intervening time, the greater the possi- 
bility for the intrusion of other influences that may tend to mini- 
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mize the effects of counseling. Some influences may facilitate ad- 
justment subsequent to counseling; others may cause maladjust- 
ment. Even though counseling results in a distinct separation of 
counseled from non-counseled in terms of subsequent adjustment, 
the randomization of subsequent influences may cause a regression 
toward the mean for both groups. 

The scant knowledge of specific counseling techniques has 
forced us to study the effectiveness of the total process. If certain 
techniques neutralize the effect of others, then the gross results 
would be negligible. As specific techniques are isolated and de- 
scribed, then new types of evaluation studies may replace the pres- 
ent gross experiments. Such studies, however, would not appear to 
be possible until more adequate descriptions of techniques are made 
available by those who actually counsel students. 


Formulating Hypotheses 


Counseling can be evaluated only if certain outcomes or criteria 
of effectiveness are assumed to result from the counseling process. 
These assumptions must be formulated as hypotheses to be “tested” 
by experimental and statistical analyses. But a second considera- 
tion is of equal importance. We must determine not only the results 
of counseling but, as in all scientific studies, the conditions under 
which these outcomes will be produced. We must answer this sec- 
ond question in terms of what kinds of counseling, what techniques, 
what types of counselors and work with what types of students will 
produce certain outcomes. Our problem, broadly speaking, then 
becomes, “What counseling techniques (and conditions) will pro- 
duce what types of results with what types of students?” 

Most counselors have empirically derived opinions, hunches and 
judgments as to what outcomes or effects they and the students 
are trying to achieve. But many of these outcomes are intangible 
and difficult to formulate as well as difficult to set up in an experi- 
mental design. We may, however, achieve some degree of agree- 
ment, for purposes of experimentation, on the following as-. 
sumptions: 

Effective counseling will lead to or result in: 

1. Occupational orientation—understanding and accept- 
ance (choice) of a tentative and broad goal and of the edu- 
cational (training) means to that goal. 

2. This goal will be appropriate to the student in that it 


will be one which will utilize his aptitudes and interests and 
will not demand either less or more (within a reasonable 


range) aptitude than he possesses (actually and potentially). 
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3. The student will make reasonable progress toward this 
goal (in training school). 

4. The student will be “satisfied” (further motivated) by 
that progress and with his chosen goal. 

In order to achieve these outcomes it is necessary that: 

1. The counselor secures the student’s cooperation (rap- 
port in the broad sense) in choosing -(orienting himself 
toward) a goal and the means to it; the desire to assay his 
assets and interests. 

2. The student generates enthusiasm to use his assets in 
attempting to secure relevant training and to achieve the 
chosen goal. 

3. The student uses his aptitudes skillfully in securing 
training in school. 

4. The counselor and the student are able to alleviate, 
relieve or remedy pressures and disabilities—family, financial, 
emotional, etc.—which interfere with or prevent the eager 
and skillful use of aptitudes and the choice of an appropriate 

oal. 
5. If these pressures or disabilities are too serious for 
the counselor to cope with, then use is made of specialized 
personnel workers. 

6. The appropriate or reasonably approximate type of 
training is available to the student. 

The above possible outcomes may be the direct or indirect, 
immediate or long-term outcomes of counseling. They may reveal 
themselves or be observed indirectly and not always by means of 
the student’s verbal report to the counselor. For example, the stu- 
dent’s orientation may be revealed in his classroom grades. Some 
outcomes may be general in nature (results of any type of counsel- 
ing technique), and others may be highly specific. Likewise some 
techniques may produce one or more cf the above outcomes when 
used with any type of student having any type of problem. Other 
techniques may be highly specific. Much experimentation needs to 
be done before we can answer these subsidiary questions. It is most 
likely that counseling cannot be equally effective with all types of 
students and ali types of conditions. 


Experimental Designs 


Drawing upon empirical knowledge, we may describe the gen- 
eral outlines of a number of possible experiments which should 
reveal some of the outcomes of counseling. We shall restrict our- 
selves to the following possible criteria: academic achievement, 
appropriate choices, cooperation, satisfaction, success, quality of 
case work, predictive efficiency, composite criteria. 

Academic Achievement. The emphasis placed upon grades in 
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educational circles has necessarily established them as the most 
used criterion of the effectiveness of counseling. Most colleges and 
universities drop unsatisfactory students on the basis of their aver- 
age grades, and reward those who achieve high marks. There are 
two methods of experimental design applicable: the comparison of 
the student’s grade average before and after counseling (2, 4, 8, 19, 
24, 43: chap. IX); or a comparison of the average grade of coun- 
seled students with that of non-counseled students who have been 
matched for such characteristics as age, sex, level of ability, size 
and type of high school and high-school grades (13, 15, 20, 27, 
41, 42). 

Both methods of control have definite weaknesses. First of all 
it must be emphasized that grades are patently only one of the pos- 
sible desirable outcomes of counseling. In addition their reliability 
and validity, as a measure of scholastic achievement, have been 
seriously questioned. Of more importance are the dissimilarities in 
patterns of subjects taken by different students. This condition 
makes the criterion of average grade a shifting scale whose com- 
parability from student to student is questionable. Moreover, in 
cases where the student has been successfully advised to leave col- 
lege there will be no subsequent grades to evaluate. In the case of 
students counseled before matriculating in college, where no pre- 
counseling grades are available, this method is not at all applicabie. 

The method of control by matching is a traditional one in scien- 
tific experimentation. It theoretically provides us with a compar- 
able population for comparing the effect of counseling with the 
effect of “normal” (or random) conditions. At the present time, 
however, it is impossible to match individuals on the very factors 
that may be of importance, e.g., motivation, personality or emo- 
tional stability. In addition, it is difficult to collect a reasonable 
number of cases which will be matchable. While the method of 
internal control, i.e., comparing grades before and after counseling, 
does away with the matching problem, it leaves indeterminate the 
problem of the effect of “normal” conditions in comparison with the 
effect of counséling. 

The use of standardized achievement tests is a possible alterna- 
tive to grades as a measure of academic achievement. Such tests 
would be more reliable and presumably would provide a more com- 
parable measure from student to student or group to group. This 
would be true in any one area of information but, where achieve- 
ments in a number of areas are to be combined, heterogeneity will 
again be introduced. As long as individuals differ in the patterns of 
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their objectives and college subjects this factor of heterogeneity 
will be a possible disturbance in the use of scholastic criteria of 
counseling effectiveness. Experiments should be made to determine 
the possible relevancy and validity of this type of criterion. 

Educational and Vocational Choices. When evaluating in 
terms of educational and vocational choices it may be assumed 
that the individual will achieve a more satisfactory life adjust- 
ment if he sets goals for himself that are neither too high nor 
too low for his potentialities (18, 32). Thus the task of the 
counselor is conceived to be, in part at least, to bring about con- 
gruence between those two factors. 

For any case we may compare the student’s statement of his 
objectives with his potentialities as judged from test data and rele- 
vant tryout experiences. The judgment of the degree to which bet- 
ter alignment has been achieved as a result of counseling may be 
made by the counselor himself, by an outsider who reads the case 
notes, by the student, or by all three persons. In favor of the former 
procedure, one may contend that there are often subliminal data 
not included in the case record which would make the counselor’s 
judgment most accurate. On the other hand, we may encounter 
difficulty in separating judgment from desire since the counselor is 
not disinterested in the outcome. An added difficulty with this type 
of criterion is the frequency of student cases in which a temporarily 
uncertain choice is the most desirable outcome of counseling. 

An indirect measure of this criterion may be used if we assume 
that more information on educational and vocational topics will 
lead to a greater probability of congruence between aspirations and 
potentialities. It seems legitimate to expect the clinical counselor 
to aid the student in acquiring such information, although this type 
of function has usually been involved in group guidance procedures. 
For the appraisal of these two types of outcomes, tests and inven- 
tories of the Kefauver-Hand type may be used (17). By these 
means it may be possible to determine whether counseled students 
have more information on which to base their educational and voca- 
tional decisions than they had before counseling or than is pos- 
sessed by a matched uncounseled control group. Since the mere 
possession of occupational and educational information is not a 
major objective of counseling, experiments are needed to deter- 
mine the relationship between the possession of such information 
and the appropriateness of the choices made by students. Such 
crucial experiments have not yet been made in support of the 
relevancy for counseling of courses in occupational information. 
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Cooperation with the Counselor. This criterion is based upon 
the premise that effective results of counseling cannot be achieved 
, unless the counselor is en rapport with the student. The fact that 
| the student receives the advice of the counselor cooperatively is 
| taken as an indication of rapport and therefore as a criterion of the 
\ effectiveness of counseling. Viteles seems to go even further when 
he says: “That advice is followed is probably in itself an evidence 
of satisfactory adjustment” (38: p. 75). Such a contention needs to 
be evaluated experimentally. We would reject any attempt on a 
physician’s part to prove the efficacy of a particular medical treat- 
ment by means of evidence that the patient cooperates in submit- 
\ting to that treatment. We would certainly withhold judgment 
until we ascertained whether his patient eventually had recovered 
or died. Cooperation is a desired outcome of counseling but 
chiefly as a means or condition necessary to other more basic out- 
comes. In this sense it is a preparatory outcome or criterion of 
counseling effectiveness. 

The measurement of this criterion would be expressed in terms 
of the percentage of the group counseled that had shown various 
degrees of cooperation. Such a result is difficult to interpret since 
there is no standard for determining what either a statistically or a 
socially significant percentage would be. Further experimentation 
and experience would, of course, provide data for deriving such a 
standard. 

The Student’s Satisfaction. Satisfaction of the student is 
deemed to be a desirable outcome of counseling. This satisfaction 
may embrace his educational and vocational objectives, the counsel- 
ing assistance, and finally the job that he ultimately secures. The 
student’s satisfaction with any of the three may be inferred from 
his verbal report, either on an interview basis or by means of an 
attitude test. Obviously many subtle or delayed satisfactions may 
not be readily observed or felt by the student. Dissatisfaction which 
results from frustration may be, and oftentimes is, followed by later 
reconciliation to substitute adjustments. 

Concerning satisfaction with educational and vocational objec- 
tives as criteria, two methods of control may be used. The satis- 
faction of the student may be measured before and after counseling 
or the satisfaction of a counseled group may be compared to that of 
a non-counseled group. In the case of satisfaction with counseling 
assistance (25, 39) neither of these methods is possible. To meas- 
ure a student’s satisfaction with counseling assistance before he has 
been counseled or when he has not been counseled would be mean- 
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ingless. We can only determine the percentage of students who ex- 
pressed degrees of satisfaction with the counseling assistance 
received and compare the results for two or more counseled groups. 
In a sense this criterion is usable to determine which of two or 
more counseling methods, or counselors, is more effective. 


The systematic and quantitative data provided by the attitude 
scale technique have not as yet been exploited in the evaluation of 
counseling. There are three types of attitude scales that may be 
used. First, a scale measuring the student’s attitude toward the 
school and his educational training. Bell has already described 
such a scale for high school students (3: p. 117-23). Second, a scale 
measuring the student’s attitude toward his vocational objectives. 
Remmers has constructed such a scale and has used it in the apprai- 
sal of the effectiveness of group guidance (28, 29). Third, a scale 
measuring the student’s attitude toward the counselor and the 
counseling assistance. This type of scale has had practically no 
application. In fact we have found only two instances of its use 
reported in the literature (14, 23). The usual approach has been 
through the report of the individual to direct questioning. 


While the student’s report is the easiest way to determine satis- 
faction and cannot be ignored as one type of satisfaction response, 
it has many weaknesses. For example, it may conceal real dissatis- 
faction behind a rationalization process. It may be a reflection of 
dissatisfaction in some other area than education or vocation, e.g., 
social, recreational, sex. The desire to please the counselor because 
of fixation or gratefulness may lead to a report of satisfaction. In 
some cases it seems too much to expect a feeling of complete satis- 
faction even with the most successful counseling, since a counselor 
cannot be expected to overcome the false hopes of a lifetime in a 
relatively short period of time. If the individual’s stratum of society 
requires a level of aspiration far beyond his capabilities, the coun- 
selor cannot be expected to bring about complete and immediate 
satisfaction. ne 





Satisfaction with a job has been the most frequently used cri-\,~ 


terion of the effectiveness of vocational counseling (4, 5, 6, 16, 22, 
23, 25, 30). In addition to the direct report of the student, scores 
on the Hoppock Job Satisfaction Blank and the number of voluntary 
shifts in jobs have been used as measures of job satisfaction. All of 
these criteria lend themselves to the use of both an internal and a 
matched control. But many objections are encountered to satisfac- 
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tion with the job, measured in any manner. Job dissatisfaction may 
reflect dissatisfaction with the low starting salaries which are char- 
acteristic of most jobs rather than with the occupational choice 
resulting from counseling. The dissatisfaction may also be caused 
by local conditions on the job, e.g., an unpleasant supervisor, un- 
companionable workmates, instead of maladjustment to the work 
involved. Likewise there are special objections to the use of shifts 
in employment as a criterion, since it is difficult to distinguish a 
voluntary from a forced shift. A shift, as measured, is an all-or- 
none process and time does not allow for a measure of degrees of 
satisfaction or promotion. 


The method of internal control with job satisfaction as the cri- 
terion is different from the one previously outlined. In this case 
before-after comparisons are not applicable. Instead, those who are 
in an advised occupation are compared to those who are not or 
with those who are in an occupation not discussed with the coun- 
selor. For this control to be meaningful the categories of occupa- 
tions must be broadly interpreted according to their general 
functions. 


Success on the Job. This criterion assumes that effective 
counseling should lead students to seek and secure jobs in which 
they can be successful. It can be measured by employer’s reports, 
number of advancements, number of forced shifts and wages earned. 
The controls applicable are the same as those for the criterion of 
job satisfaction (4, 9, 16, 22, 25, 30). 


The use of a success criterion has at least four general weak- 
nesses. First of all, success is a relative matter, relative to the stu- 
dent’s ambitions and to the reactions of his social group to his 
achievements. Secondly, success may come years later with many 
other factors, unrelated to the original counseling, intervening to 
cause it. Success in school is a more immediate adjustment the 
student must make before the vocational adjustment is necessary. 
Thirdly, some students advance vocationally more quickly because 
of aids from parents or friends and not because of counseling. 
Finally, this criterion is complicated by the influence of the quality 
of placement work in the senior year of training and is only re- 
motely a criterion of counseling in the freshman year. 


Each of the methods of estimating this criterion has been seri- 
ously criticized (33, 34). Employer’s reports may be subject to 
error because of the influence of an adverse personal relationship 
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between employer and employee unrelated to quality of work, be- 
cause of the state of the labor market or because of atypical suc- 
cesses or failures at the time of the follow-up interview or ques- 
tionnaire. Quite often there will be problems of locating the 
employer, especially when the student has experienced a number of 
shifts within a short period of time (9). The absence of standards 
for comparison and the difficulty in securing cooperation are also 
contributory factors to the unreliability of employers’ reports. 

Number of advancements in employment may be unsatisfac- 
tory as a criterion of success because the best occupation for an 
individual may be one in which there are few opportunities for 
advancement. In addition, in most cases advancement occurs over 
a long period of time. The longer the intervening time, the more 
difficult it is to determine whether the original counseling has 
been the decisive factor rather tian any of the many intervening 
influences. Likewise, the number of shifts in employment pre- 
sents drawbacks because of the ‘difficulty in distinguishing volun- 
tary from forced shifts and the all-or-none nature of shifts in 
jobs. 

Paterson and Darley (26: p. 19) and, more recently, Lurie (21) 
have presented evidence which indicates that shifts in jobs may 
not always be reliable indices of the individual’s adjustment. 
The older study found that the number of job changes did not 
discriminate workers unemployed early in the depression from 
those unemployed late. Lurie found that workers discharged 
during retrenchment were, as a group, as capable as those 
retained. 

In order for comparisons on the basis of wages earned to be 
meaningful, it is necessary to compare individuals who are work- 
ing on jobs where comparable wage:scales prevail, a difficult task. 
Another objection is that wages may reflect extra-individual 
conditions beyond the scope of the counselor’s function. 

Quality of Case Work. The type and appropriateness of the 
various procedures and techniques used by the counselor are 
assumed to be the marks of good counseling. Studies using such 
criteria are, however, to be considered as preparatory to final 
studies of the effectiveness of guidance. It should be recognized, 
however, that unless thorough-going methods are used there is 
little point in making an experimental evaluation (42). 

’ A critical analysis of the techniques used by the counselor 
and a critical reading of case history and interview notes are the 
most feasible methods to determine their appropriateness (7, 39). 
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An unbiased but well-informed “outside” judge would seem to be 
the most desirable agent to perform such an analysis of case 
records. There is a difficulty here in that a well-informed judge 
is likely to be one who has had counseling experience himself 
and is, therefore, unlikely to be free of convictions. This method 
is at best a rough measure of whether the counselor used the 
particular techniques judged appropriate by other counselors. No 
measure of the effectiveness of these techniques results from the 
use of this criterion. 

Predictive Efficiency. The efficiency of educational diagnosis 
by the counselor may perhaps be studied more accurately. One 
possible experimental setup would compare the efficiency of pre- 
diction for pre-college cases by the counselor to that of a statis- 
tical predictive equation (5). The problem could be further 
differentiated by comparing predictions made by the counselor on 
the basis of preliminary information, tests, questionnaire infor- 
mation and preliminary interview, with predictions after the first 
counseling interview. This would serve to determine the relative 
importance or validity of the information and impressions col- 
lected in the counseling interview with case data, such as test 
scores, available to the counselor before he confers with the 
student. Another differentiating study would involve having a 
case reader, who had no counseling relationship with the student, 
predict educational achievement on the basis of all the informa- 
tion available up to, but not including, the counseling interview 
itself. Such predictions may be compared with those made by the 
counselor after he interviews the student. Such crucial experi- 
ments are needed; a preliminary one will soon be reported by 
the authors. 

There are two assumptions that may be applied here as a 
basis for evaluating the counseling program. One objective that 
may be assumed for a counseling program is that of enabling 
students to compensate successfully for their disabilities in order 
to succeed. If that is an objective, then the expected evidence 
of efficiency in the counseling program would be a lower prog- 
nostic efficiency of a test battery for counseled students than for 
non-counseled students. If counseling is effective in this sense, 
then students who, if left alone, would fail, may succeed. 

Another objective of counseling may be to bring all factors 
other than those of aptitude (interest, opportunity, working con- 
ditions and so on) to a common level. Thus the performance of 
the students would be distributed according to their levels of 
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ability. The greater the excess of predictive accuracy for coun- 
seled over non-counseled, the closer the counseling program will 
be presumed to have come to the ideal—that of removing all 
influences other than ability which interfere with student 
achievement. 

Composite Criteria. All of the criteria discussed above have 
been partial criteria, since none of them was assumed to be 
evaluating all the possible objectives of counseling. We turn 
now to possible methods by means of which a more comprehensive 
evaluation of counseling may be secured. 


The Use of a Judgment Criterion 

It is at this point that a clear schism appears between an 
approach which is narrowly statistical and an approach which 
makes use of statistical methods in conjunction with the experi- 
mental situation. The former point of view has the desirable 
objective of clear-cut results, but, in its blind adherence to tradi- 
tional method, produces results which are unlikely to be significant 
either statistically or socially. This method would mechanically pool 
all of the part-criteria either in some form of average or in a profile. 
The method of averages compounds the artificiality which previ- 
ously had been indicated as inherent in the use of the part- 
criteria, without reference to the individuality of each student. 
The method of profiles suffers from a lack of well developed 
statistical techniques for handling that type of data and, more 
seriously, from the fact that artificial data cannot be refined and 
validated by casting them into profse form. 

Rather than sacrifice meaningfulness for neatness of statis- 
tical treatment, the other approach has clearly recognized the 
impracticability, at the present, of getting more than rough 
measures of the general efficiency of counseling (33, 37). It-has' 
therefore attempted to use a judgment criterion by means of 
which the adjustment of the student is estimated in terms of his 
original problems and any of the available ‘data, including the part 
criteria (16, 22, 27, 30, 31, 36, 38, 42, 43: chap. IX). 

As described in Williamson and Darley the judgment of ad- 
justment is based upon a follow-up interview of the student (43: 
chap. IX). The status of the case at the time of follow-up is 
always considered in the light of the diagnosis and prognosis 
made earlier by the counselor. All of the various types of data 
—grade achievement, the student’s statement of satisfaction and 
adjustment with regard to vocational orientation and choice, 


17 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


information concerning the student’s activities, judgment of gen- 
eral attitude, etc.—are weighed by the judge according to their 
relevance to the individual case. In this way the possible errors 
inherent in a non-personalistic interpretation of objective data 
are minimized. 

The simplest experimental design requires that either the 
counselor or a case reader make the judgment as to the degree of 
the student’s adjustment subsequent to counseling and in contrast 
with his pre-counseling adjustment. A detailed manual of direc- 
tions, including examples of degrees of adjustment, is necessary 
in such an experiment (43: chap. IX). The results are reported 
in terms of the percentage of students counseled who achieved 
various degrees of adjustment. If the counselor makes such a 
judgment, it should be pointed out that, if he is a good one, he 
will know subtle angles and attitudes which are unlikely to be 
explicitly stated in the case records and which would be over- 
looked by an independent case reader. Many of the subtle influ- 
ences in a case may even exist in unverbalized form for the coun- 
selor and have no possibility of appearing in the case record. 
In addition, the counselor knows, perhaps better than anyone 
else possibly can, what he has been trying to do. The disad- 
vantages of using the counselor’s judgment lie first of all in his 
special interest in the results which may lead to an approach 
which is either too self-critical or too self-lenient; and secondly, 
in the undesirable consequence that the counselor’s effectiveness 
in counseling may be decreased because of his awareness of his 
responsibility for evaluating his own efforts. 

While the use of the independent case reader obviates the 
possibility of impairing the counselor’s effectiveness and iatro- 
duces a theoretically impartial evaluator, it also has its draw- 
backs. As has been indicated, the case reader may miss many of 
the nuances. There are also so many conflicting philosophies and 
procedures and techniques in counseling that the case reader may 
be either unsympathetic with or ignorant of the counselor’s spe- 
cific objectives. In order to achieve greater impartiality and objec- 
tivity, two case readers and an arbitrator have been used. Wil- 
liamson has reported the use of this method with three trained 
workers who had nothing to do with the diagnosis and counsel- 
ing, but who collected data directly from the students for in- 
dependent and pooled judgments of effectiveness (42). With 
trained judges heterogeneity of point of view need not inter- 
fere with consistency in judgments. 
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While the experimental method outlined may yield evidence 
of the degree of improvement in the counseled population, it 
does not prove that cooperation with the counselor’s suggestions 
was a necessary condition. Evidence for the latter may be ob- 
tained by estimating the degree of cooperation of each student by 
one of the methods previously discussed. By comparing the ad- 
justment achieved by students who cooperated, with the adjust- 
ment of those who did not, we can determine whether cooperation 
was necessary and to what degree. If we find that those who 
cooperate adjust better than those who do not, we still have a 
question of whether the degree of adjustment achieved by those 
who did not cooperate might not be equalled or bettered by those 
who received no counseling at all. A matched non-counseled 
group would seem to be the only means of providing an answer 
to this question. We have already discussed the possibilities of 
the matching process. If we were to attempt to avoid the 


_ matching problem by counseling every other student who comes 


for counseling, retaining the other half of the group as controls, 
we would be doing violence to a social canon. The real solution 
must await a time when we have sufficiently isolated treatment 
techniques and problems to compare two treatments used with 
the same type of counseling problem. 


General Considerations 

Our consideration of the types of criteria and the methods of 
measuring them, feasible in the evaluation of the effectiveness 
of counseling, has touched upon definite limitations on exact 
evaluation of counseling. Whether these weaknesses will be 
insurmountable and will restrict evaluation to rough, rule-of- 
thumb methods depends upon future progress in experimentation. 

One type of difficulty is the inability to set up clear delinea- 
tions of the problems and variables involved. This has been 
traced first to the inadequacy of descriptions of diagnostic and 
treatment techniques of the counselor plus the gaps in knowledge 
of student problems (40: chap. XXVII). A second source is 
the element of uniqueness in the student’s problems and the coun- 
seling techniques appropriate for them. The criteria which have! 
been considered have the weakness of being either too gross a 
measurement or so far removed from the individual as to lack the / 
quality of meaningfulness. If, in the future, methods are devised 
for providing more adequate criteria, then more exact experi- 
mentation may be made. 
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A second limitation, inherent in the nature of the counseling 
situation, is the sampling difficulties involved in setting up con- 
trols. This difficulty has been stated by Murphy et al.: “In con- 
nection with the control group, there should be noted the relative 
impossibility of securing a true ‘control.’ The fact that the ex- 
perimental group applied voluntarily for counseling introduces 
a selective factor unmatchable among non-clients” (23: p. 952). 
Although this assumption has never been established even in the 
plethora of learning experiments, only assumed, caution should 
be applied in interpreting the results of matching experiments. 
We also lack adequate techniques for matching students for such 
pertinent factors as interest, perseverence and other similar qual- 
ities. At the same time, it is impossible to set up an experiment 
which would entail selecting cases from the general population, 
since willingness to be counseled would seem to be one of the 
necessary conditions for counseling. These limitations in sam- 
pling methods imply that evaluation must necessarily be a long- 
time process, involving a great deal of experimentation with 

| different methods. 

The condition that diagnosis and counseling cannot be 
studied separately is a further complicating factor. When the 
counselor has made a diagnosis of the student’s problems, its 
causes, and the types of treatments that are likely to solve it, 
he cannot determine whether his diagnosis was correct unless the 
student carries out the recommendations. For example, if a 
counselor’s diagnosis states that student A can do effective work 
in college only by following certain of his recommendations, 
student A must remain in college for that diagnosis to be tested. 
The inability to control the conditions necessary for an adequate 
tryout of counseling recommendations often precludes determina- 
tion of the effectiveness of the advice. Factors which are often 
beyond the control of either the counselor or the student include 
restriction imposed by the school administration, those imposed 
by social codes, prejudices and attitudes of students or parents 
and lack of proper placement facilities. 

The criteria and methods discussed in this paper have little 
application for the comparison of individualized counseling with 
other types, e.g., group, traditional, casual interview, etc. This 
situation arises from the nature of the data yielded from these 
kinds of counseling. For example, the casual interview, by its 
very nature, does not produce very much information about the 
individual’s aspirations, his difficulties, or the counseling methods 
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used by the counselor. Grade or information achievement repre- 
sents the only type of criterion that can be applied in evaluating 
this type of counseling. 

Our discussion has shown that there is a need for more sys- 
tematic studies, using the more feasible part-criteria. Other 
approaches have been indicated as having possibilities. The rela- 
tionship between the student’s educational and vocational objec- 
tives and his level of ability should be studied further as a 
criterion of adjustment. Some studies have already yielded some 
preliminary results (1, 10, 29, 32). In the last few years this 
problem has been receiving attention under the term level of 
aspiration. Studies have revealed some provocative principles 
under laboratory conditions (11, 12), but we must learn whether 
these principles have validity for life situations. We should 
determine whether success in one area, i.e., vocational or educa- 
tional, has an effect on the level of aspiration in other areas of 
adjustment. What are the relations between level of aspiration 
and feelings of failure? Can vocational or educational success 
be such a potent factor that it would outweigh other experiences 
in determining an individual’s general success-failure feelings? 
How vital are social group factors in determining the individ- 
ual’s level of aspiration? To what degree do levels of aspiration 
persevere at various age levels? The answers to these questions 
would seem to be pregnant with implications for both the coun- 
selor and the evaluator. 

If and when our knowledge of student problems and of diag- 
nostic and treatment techniques has advanced sufficiently, we will 
have the opportunity to carry out more exact investigations. At 
this point, we can foresee experimental designs which should 
be applicable when such advances are made. One possibility is 
an experiment in which individuals having problem A will be 
divided into two groups, one which will receive treatment 1, the 
other treatment 2. In this way we may determine which specific 
techniques are most effective for a particular problem. 

Another plan of experiment could be designed to determine 
for what types of problems a treatment is applicable. Here, two 
groups, one representing problem A, the other problem B, would 
both receive treatment 1. Both methods could be expanded to 
iriclude all types of treatments and problems. Criticism of these 
designs may be directed at the apparent assumption that problems 
may appear isolatedly. That this is extremely unlikely cannot be 
denied. Yet, assuming advances in our techniques and proced- 
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ures, it seems possible that the types of factorial design used in 
analysis of variances can be utilized to take care of the effects 
of interactions among treatments for various problems. The 
success of such an experiment will depend upon the discovery of 
a number of cases in which one problem is clearly present and 
other types are minimal in significance or complexity. 


Summary and Conclusions 


1. All available methods of evaluation have weaknesses. 

2. Composite criteria which avoid arithmetic combination of 
the part-criteria are at present least open to question, although 
still being crude measures. 

3. The problem of securing sufficient data without doing 
violence to the concept and practice of counseling is a real one. 
Involved also are the inadequacy and incompleteness of most 
available case records. 

4. The proper time interval to use for evaluation is extremely 
important because of the possible relationship between the inter- 
vention of confusing factors and the length of time between 
counseling and evaluation. 

5. The methods used for validation of diagnostic and prog- 
nostic tools (e.g., tests) may not be applicable because of the 
uniqueness of each counseling situation. Stated another way, the 
methods of studying students in general may not be applied to 
the study of individual students with particular problems. 

6. An impediment to more exact evaluation is the inability to 
control conditions for an adequate test of counseling recommen- 
dations. 
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THE LOGIC OF AGE SCALES* 


M. W. RICHARDSON . 
United States Civil Service Commission 


An age scale is a type of psychological test designed to 
measure general mental ability (“intelligence”) in terms of 
performance of various mental tasks found to be normal for 
various ages. The child whose performance is typical of ten- 
year-old children, for example, is said to have a mental age of 
ten. The method was invented by Alfred Binet. Although the 
techniques have been modified in detail by British and American 
psychologists and the device of the intelligence quotient (I.Q.) 
has been appended, the main outlines of Binet’s work have been 
retained. The most widely used of the age scales is the Stan- 
ford Binet. 

The age scale has been widely accepted. The I.Q., in particu- 
lar, has passed into the language of the general public, together 
with the common misconceptions connected with its brief history 
in science. The fact that a device has attained wide use is not a 
guarantee of its soundness; and it is sometimes necessary in the 
interest of sound scientific advance to examine critically proce- 
dures and devices in common use. If the criticisms in this paper 
seem to be directed chiefly to one particular age scale, the expla- 
nation is that this one scale is the most widely used and has 
been most carefully constructed and standardized. 

A person whose academic specialty is the logic of science 
addressed a group of psychologists on the necessary conditions of 
measurement. He discussed the familiar matter of equality of 
units and the operational test of equality of units by the coinci- 
dence of any part of the scale with any other upon superimposi- 
tion. He mentioned the matter of measuring a single variable at 
a time, and the necessity of having a real origin of measurement, 
if ratio comparisons are to be made. He followed this sound 
discussion of the scientific method with a curiously erroneous 
one; he congratulated psychologists in having, in the Binet age 
~scales, a measuring device that meets all three of the requirements 
for a scientific measuring device. The writer and others carefully 


* This article is adapted from a chapter in a forthcoming book on test 
theory by the same author. 
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pointed out to the speaker that the age scale meets not one of the 
three requirements set up; it has no real origin of measurement; 
its units are not equal, and it does not isolate a single, unitary 
variable for measurement. 

When it is uncritically considered, the age scale idea seems 
to be a happy one. What could be more simple and direct than 
to see how high the child can ascend an evenly graded scale? 
The concept of normality of performance made it the most 
natural thing in the world to describe such test performance in 
terms of the age for which the performance was normal. The 
concept of the age scale is a deceptively simple one, however, and 
the writer is of the opinion that the unstated special requirements 
and limitations of the age technique have received too little atten- 
tion. It is true that during the twenty-one years between 1916 and 
1937 many papers pointing to difficulties in scaling, scoring, and 
interpreting the Stanford Binet appeared. Moreover, several 
issues were kept continually in the foreground of attention. 
Unfortunately, the distinction between purely psychometric issues 
and psychological issues was not always made. The result is that 
certain deficiencies and limitations belonging to the mechanics 
of test construction were misinterpreted as psychological issues. 
A case in point is the constancy of the I.Q., about which it will be 
necessary to say more later. 


Validity of Age Scales 


A Binet scale consists of a series of sub-tests or items designed 
to measure “general intelligence,” whatever that may mean. For 
example, the 1937 edition of the Stanford Binet contains 127 sub- 
tests graded in difficulty from tests suitable for two-year-olds to 
those suitable for superior adults. The sub-tests were selected in 
the process of construction from a larger number of sub-tests. 
It is pertinent to inquire into the method of selection of the sub- 
tests. In what respect does the method of selecting the sub-tests 
insure that the resulting scale will be valid? One of the devices is 
to plot the percentage of correct responses to any given sub-test 
against the chronological age, after the sub-test has been applied 
to “unselected” children of various age groups. The age at which 
just half of the children pass the test is taken as the scale-position 
of the item. The plots of percentage of correct answers against 
chronological age differ from item to item; some curves are 
steeper than others. Theoretically, the items with steep curves 
are selected to make up the scale; actually, in the construction 
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of the Binet tests many compromises with practical expediency 
are made. Under special conditions a mathematical function can 
be used to describe a discrimination curve of items against chron- 
ological age. It has been shown that the use of this function is 
simply an alternative to the correlation methods, and precisely 
equivalent to thein under certain special conditions. 

This type of analysis throws light on the “validating” proce- 
dure. The retention of sub-tests on the basis of sharp curves of 
age discrimination is the same as retaining items that have the 
large correlations with chronological age. The criterion of valid- 
ity is simply chronological age, and the practical effect of the 
procedure is to select items that have relatively high correlations 
with chronological age. 

The procedure leads to a serious difficulty. The standing high 
jump, or other athletic skills, yield similar discrimination func- 
tions, since they are likewise positively correlated with chrono- 
logical age. The method of item selection thus breaks down as a 
way of attaining validity. The only criterion of validity remain- 
ing is the judgment of the persons constructing the scale. An 
allied consideration is that the selection of items on the basis of 
high correlation with mental age on the same or previous scale, 
is merely a measure of internal consistency or reliability. Nothing 
in the general procedure operates towards the selection of items 
that measure a unique trait. An interesting logical difficulty 
appears. Suppose that, out of the hand-picked collection of items 
supposedly measuring the aspects of intelligence desired in the 
scale, the items selected are those which have the steepest dis- 
crimination functions. Let us assume further that two items are 
so discriminating and so far apart in proper age-location that 
their discrimination functions do not overlap. The result is that 
the two hypothetical “good” items or sub-tests have a zero corre- 
lation. A scale made up of such items must necessarily be unreli- 
able as a composite measure. Furthermore, to the extent to which 
the search for valid sub-tests by this procedure should be success- 
ful, the number of different factors measured would increase. 
Evidence at present suggests that no fewer than six different 
mental functions are measured in a higgledy-piggledy fashion 
: by the Stanford Binet. The multiplicity of factors is perhaps not 
so serious as the fact that different things are measured at differ- 
ent ages. The sobering fact about the age-scale technique is that 
we do not know what is being measured, or what any given intelli- 
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gence quotient means in terms of the relative standing of the 
individual. 

As an added complication, the score received by the testee is 
expressed in terms of mental age. Mental ages are statistical 
numbers based on the concept of normal or average test perform- 
ance of children of a given age, when the sample of children is 
representative of some population of children. On page 25 of 
Measuring Intelligence, Terman and Merrill state that the expres- 
sion of a test result in terms of age norms rests upon no statistical 
assumptions. The statement is erroneous and misleading. The 
truth of the matter is that the mental age is a measure derived 
from raw scores in accordance with certain assumptions; it is as 
definitely statistical in nature as the standard score, for example. 

In using scales of the Binet type, we choose to express test 
performance, not in the arbitrary units of number of items passed, 
but in terms of “mental years and months.” The raw score “units” 
are of course arbitrary, in the sense that they are not units at all. 
The child who answers 12 items correctly cannot be said to exceed 
the child who passes 9 by the same amount that the latter sur- 
passes a third child who answers 6 correctly. No ordinary test 
can be expected to satisfy the additive property required for 
measurement on a scale. But the mental year or the mental month 
is likewise not a real unit of measurement. In order for the 
mental year (or month) to be a real unit of measurement, it 
would be necessary for the function representing mental growth 
to increase regularly with chronological age. If, during each 
year, a child had the same increment of mental growth, the men- 
tal year or mental month would be constant in value. However, 
it is commonly agreed that the child matures less and less rapidly 
as he grows older, in intelligence as well as in physical character- 
istics. The annual increment of “intelligence,” i.e., a mental year, 
steadily becomes less until mental maturity is reached, at which 
time it is zero. An age scale is, therefore, not a true scale because 
it is not built up from equal units. In this connection, it may be 
noted that the true shape of the mental growth curve cannot be 
determined from scores expressed in terms of mental ages. If 
such were attempted, one would get results predetermined by 
the crude growth curve adopted in order to express raw scores 
in terms of mental ages. 

Whatever the merits of the assumed growth curve may be, the 
crucial consideration is that its true shape and its upper limit 
cannot be determined by use of a “scale” expressed in terms of 
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mental ages. The true shape of the mental growth function can 
be: determined, strictly speaking, only by use of a scale with real 
units of measurement. Once the mental growth function is estab- 
lished, it is possible to calibrate the underlying true scale in 
terms of mental ages, if desired. Then the interval between the 
mental age of four and the mental age of five might be expressed 
as a certain fraction of the scalar unit, the mental year between 
five and six as a somewhat smaller fraction of the same real unit, 
etc. Finally a place would be reached where the mental year is 
a negligible fraction of the real unit, and therefore has a value 
of practically zero. We would then have a proper (although 
indirect) experimental solution of the problem of the limit of 
mental maturity. The widespread use of mental ages has not 
helped to solve the problem, mainly because the use of mental 
ages as derived measures begs the question. 

Although the exclusive use of mental ages forever begs the 
question of the limits of mental maturity, it is urged that a 
definite and well-accepted social meaning has been attached to 
them. It seems simple enough to define mental age as the average 
or median test performance of typical nine-year-old children. 
The definition works well enough until the limit of mental matur- 
ity is reached. If the limit of maturity is assumed to be 15, a 
mental age of more than 15 is impossible, by definition. Mental 
ages of more than 15 are assigned in the process of standardiza- 
tion to test performances by use of “cut-and-try” procedures 
based on some not well-defined assumptions. At best it is unfor- 
tunate that the definition of mental age must be radically shifted 
at one point or region in the age scale. 


The I. Q. and Its Troubles 


To multiply confusion, the device known as the intelligence 
quotient has been adopted. The intelligence quotient is defined as 
100 times the ratio of mental age to chronological age, and is thus 
an index of brightness. An index of brightness can of course 
be no more than a statistic relating the individual’s test perform- 
ance to the average performance of those of the same age. It is 
exactly as true of I.Q. as it is of other possible statistics serving 
the same purpose that one must always use it in connection with 
some measure of variability of test performance within the age 
group. Obviously one measure taken from a distribution has no 
meaning unless a measure of dispersion is given. It might be 
argued that clinicians keep in mind some kind of subjective scale 
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which serves in lieu of the ordinary statistical parameters. If so, 
the mental feat is remarkable since the various age groups, in one 
age scale at least, have dispersions of intelligence quotients which 
vary considerably. The difficulty of interpretation of the single 
statistic (the I.Q.) is greatly increased where the standard devi- 
ation of I.Q.’s of one chronological age group may be twice as 
large as that of another age group. The intelligence quotient 
shares with the mental age from which it is derived the fictitious 
character of measures above the maturity level. It is nonsense 
to describe an adult as having an I.Q. of 120 because such a state- 
ment is based on an irrational definition and is unverifiable exper- 
imentally. It is the, practice of Binet testers to assume some defi- 
nite chronological age as the upper limit of mental growth. Thus, 
it is assumed on at least two age scales that the upper limit of 
the average child is reached at the age of fifteen. The crucial 
difficulty in making such an assumption is not that the upper 
level set may be wrong, but it lies in the utter impossibility of 
checking up on its correctness by means of age scales. 

One of the moot questions about the I.Q. is its constancy. 
It seems unfortunate to the writer that so much time of psycholo- 
gists has been wasted on such a matter. It seems that what ought 
rightly to be merely a formal problem in test construction has 
been translated into one of spurious psychological significance. 
The only question properly asked at this time may be definitely 
stated: Did the authors of the age scale succeed in constructing 
a device which gives a constant I.Q.? Questions involving changes 
in I.Q. possibly attributable to environmental factors must always 
take into account the fluctuations of the I.Q. which are due to the 
test and to the statistical operations used to determine the intelli- 
gence quotient. Failure to consider the expected magnitude of 
fluctuation of the I.Q. may easily result in gross misinterpreta- 
tions of time-changes in its value for any one individual. 

Before we can properly evaluate the effect of organic and 
environmental factors on the intelligence quotient, we are forced 
to consider the variations in the I.Q. inherent in the testing 
technique. The fluctuations are those associated with the concept 
of reliability. For the 1937 Revision of the Stanford-Binet Scale 
the estimated reliability coefficient varies from 0.90 to 0.98, the 
higher reliability being associated with the lower I.Q. intervals. 
A median value for those near 100 I.Q. is 0.92. A representative 
value of the standard error of measurement is 4.5. Certain sys- 
tematic variations in the individual I.Q. are also found. The 
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practice effect is one type of systematic variation. Terman and 
Merrill estimate that the mean increase in I.Q. on the second test 
(which means on the other form, since two forms, L and M, are 
provided) ranges from 2. to 4.4, when the time interval between 
testings is short. The increase due to practice effect is presum- 
ably greater when the same form is repeated. 

Another systematic source of error may lie in details of the 
construction of the age scale. Thus, it might be a characteristic 
of certain age scales that the I.Q. of a superior child decreases 
with chronological age. Strictly speaking, nothing in the defini- 
tion of the I.Q. requires constancy during the entire period of 
development. The constancy of the I.Q., if it exists, is imposed 
by the process of standardization. Therefore, an experimentally 
obtained constancy of the I.Q. proves only that the scale has been 
constructed in such fashion as to produce constant I.Q.’s except, 
of course, that random fluctuations will still be present. When, 


.and only when, an age scale has the characteristic that I.Q.’s of 


individuals at all age levels tend to remain constant, one may 
attach significance to the case of the unusual individual whose 
I.Q. does not remain constant. The significance of any such shift 
of I.Q. in time must be judged in relation to the normal shift to 
be expected by random error, or unreliability. 

Increases or decreases of the order of magnitude of three 
times the standard error of measurement must first be tested with 
respect to the magnitude of variable errors of measurement before 
it is legitimate to entertain the hypothesis that some other factor 
such as special therapy, change in environment, or organic change 
is responsible for the shift in I.Q. In addition, obtained increases 
should be scrutinized carefully from the standpoint of possible 
practice effect. All such interpretation is predicated on the basis 
of constant I.Q., as built into the scale itself. One may properly 
inquire just how a scale with constant I.Q. may be constructed. 
In view of the inherent difficulties with the mental age and intel- 
ligence quotient, it is impossible to state any perfectly general 
rules. However, the part of the scale which is treated as if the 
growth curve were linear, viz. two to 13 years, can be abstracted 
for discussion. The problem the test constructor faces is that of 
providing that most individuals will be assigned the same I.Q., 
within the reliability of the scale, every year from two to 12 
inclusive. If the various sub-tests are properly scaled, i.e., 
assigned to a given year level as a median performance of unse- 
lected children of that age, the I.Q. of 100 will remain constant. 
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Inconstancy is to be expected in intelligence quotients other 
than 100, unless certain conditions are satisfied. One condition is 
that the variability of test performance increases for successive 
age groups from two to 12. If we assume that children of a 
given typical group vary more among themselves as they grow 
older, it is possible to arrange matters so that the I.Q. of most 
individuals will be constant. Suppose that we have an individual 
whose I.Q. at age eight is 120. The mental age from which the 
1.Q. is estimated is 9.6 years (or 9 years, 7 months, approxi- 
mately). Let us further suppose that we have a distribution of 
mental ages of all the children in our sample of eight-year-olds. 
The mental age of 9.6 years is, say, one standard deviation above 
the mean of the distribution of eight-year-olds. The standard 
deviation of the distribution is 9.6— 8 = 1.6 mental years. 

Now, if the same child is to have an I.Q. of 120 at the age of 
nine, his mental age will then be 10.8 years. If the nine-year-old 
sample is composed of the same children we should expect, except 
for errors in measurement, that the child will have the same posi- 
tion in the nine-year distribution that he had in the distribution 
of eight-year-olds. Since his mental age is now 10.8, the standard 
deviation of the distribution of nine-year-olds is 1.8 mental year. 
Similarly, the standard deviation of the ten-year-olds must be 
2.0 mental years; of eleven-year-olds, 2.2 mental years; etc. The 
preceding illustration shows that, for an assumed linear growth 
function, the standard deviations of the mental ages must have 
constant increments for each advancing year. How shall this be 
done? Considering, for sake of simplicity, that we have just six 
sub-tests at each year level, we may increase the standard devi- 
ations of successive year levels by (a) selecting sub-tests which 
have higher intercorrelations at the older age levels, (b) assign- 
ing a larger number of mental months to each sub-test. It will be 
seen at once that the latter is inadmissible since a total of 12 
mental months is assigned at each year level. The conclusion is 
inescapable that the degree of correlation between sub-tests must 
increase steadily with higher age levels if the I.Q. is to be 
constant. 

The foregoing treatment is theoretical and does not imply 
that the authors of any age scale have consciously attempted to 
attain constancy of the I.Q. in such a manner. More probably 
they have taken advantage of the fact that variability of mental 
performance does increase with age. Such increase affects the 
arbitrary units of measurement employed in somewhat unpre- 
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dictable fashion; it may be sufficient to account for the relative 
constancy of I.Q.’s attained in age scales. : 

It is difficult, however, to account for the attainment of con- 
stancy of I.Q. and, at the same time, approximately equal disper- 
sion at the various age levels. The range of standard deviations 
of I.Q.’s reported for various half-year levels is from 12.5 to 20.7. 
A representative value is 17. The values vary considerably, prob- 
ably because of accidents of scale construction. Certainly the 
values given by Terman and Merrill do not vary systematically 
with age, and the authors assume that the true variability is 
nearly constant from age to age. However, let us consider a 
half-year group as an approximation to “point” age. If all indi- 
viduals within such a half-year group are considered to be of the 
same chronological age, the mental ages are proportional to the 
intelligence quotients, i.e., a plot of M.A. against I.Q. is linear. 
Even for a half-year interval, approximate linearity must hold; 
otherwise the definition of an I.Q. is meaningless. 

It follows that if half-year groups have the same I.Q. disper- 
sion, they must have approximately the same mental age disper- ° 
sion. But the mental age dispersions must increase from year to 
year in order for I.Q.’s of individuals to be constant. It thus 
appears that two possible properties of the I.Q. are inconsistent 
and not attainable at the same time, in any strict sense. The most 
serious criticism to be directed against Terman and Merrill’s dis- 
cussion of the matter is that they tend to treat the (roughly) 
approximate equality of dispersion of I.Q. at the various age 
levels as experimental facts, as perhaps having psychological sig- 
nificance. The I.Q. is a statistical concept, having the properties 
we put into it by the accidents of cut-and-try scale construction 
or which we force it to have by conscious design. If we postulate 
that our statistical index shall have certain properties, we 
can then construct a test in accordance with our imposed 
requirements. 

The only reservation is that we may “possibly have imposed 
characteristics which are mutually inconsistent, in which case 
we perforce discover the source of the difficulty. If we fail to 
limit the properties of a statistic by rational design, the vagaries 
of that statistic will be brought to light in subsequent empirical 
studies. The result is the raising of such false issues as the 
constancy of the I.Q. The gist of the matter is that the I.Q. 
can be made to be constant, if that is thought to be a desirable 
property. If the scale is not constructed in such a way as to give 
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constant I.Q.’s for (most) individuals, the die is cast and the 
I.Q. will not be constant. 


In summary, it may be stated that the age scale technique— 


(1) possesses no advantage over group test methods 

(2) has no straightforward rationale so that the process 
of standardization may proceed without the necessity 
for “adjustments” 


(3) meets none of the three requirements of real mental 
measurement 

(4) leads to much useless work in correcting the previous 
“standardization” 

(5) embodies the mental age and intelligence quotient, both 
extremely unfortunate concepts 

(6) leads to problems of spurious psychological signifi- 
cance, such as the constancy of the I.Q. 


(7) makes impossible any solution of the mental growth 
function 

(8) has led to dubious devices and untenable interpreta- 
tions of various sorts, among them “scatter” and meas- 
ure of “mental deterioration.” 

It is recommended that the age scale technique in its present 
form be abolished in its entirety, and that it be supplanted by 
reliable homogeneous group tests of single functions. The latter 
can be recombined, if desired, into a single index of mental 
capacity based on position in year group. A better procedure is 
to continue work towards some real unit of measurement, to the 
end that departures from normal growth in several functions may 
be discovered and clinically interpreted. If, by reason of demand 
from teacher, parent, or psychiatrist it will seem necessary to 
give a general index of (average) mental level reached, it can be 
done by use of a suitable combination of measures furnished by 
the separate tests. It is desirable, however, to avoid the use of a 
single index of mental level. 


It has been urged in defense of the Binet test that during its 
administration, the trained clinical psychologist has an opportu- 
nity to make observations of the child’s behavior other than that 
required for rating “general intelligence.” It is maintained that 
such observations may have as much value as, or more value than, 
the mental age, in getting a “clear picture of the individual 
tested.” If such clinical insights can be reliably obtained and 
recorded, the obvious desideratum is a standardized interviewing 
technique, to be applied and interpreted entirely separately from 
the measures of primary abilities. 
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COUNSELING ON THE BASIS OF INTEREST 
MEASUREMENT* | 


JOHN G. DARLEY 
University of Minnesota 


As the counselor studies his available data on abilities, 
achievement, interests, personality, and background of the stu- 
dent facing him in the interview, he must select a conversational 
starting point that will establish rapport and get the interview 
under way. At some early time he must discuss the student’s 
stated reason for seeking help, and eventually he must interpret 
the interest test data in a manner understandable to the student. 
Assume that the student makes A scores on the occupational keys 


-for Y.M.C.A. secretary, and personnel manager, and B+ for 


school superintendent and social science teacher on the Strong 
Vocational Interest Blank. Assume that his claimed occupational 
choices are business, engineering, and “executive work.” He feels 
the need of help in making a final occupational choice. 

At the point of interest test interpretation, the counselor can 
make this bald statement: “You have the interests of a Y.M.C.A. 
secretary or a personnel manager!” With minor modifications 
this is probably the standard approach to interpretation. There 
is no more probable way to lose a case than this. It is the least 
effective clinical approach, for the following reasons: 


1. The student’s spoken or unspoken response is usually 
“How can you say that? I never was a Y.M.C.A. secretary 
or a personnel manager!” At this point the counselor must 
backtrack and start a rather incoherent explanation of the 
basis of interest measurement, to his own and the student’s 
confusion. 


2. If the student accepts the statement without raising 
the foregoing issue in some form, the chances are he will 
re-interpret the statement, then or later, to mean that he 
has the ability to be a Y.M.C.A. secretary or a personnel 
manager, and that these are two jobs where his success is 
guaranteed. If any other factors interfere with curricular 

* This article is the first draft of a chapter in a forthcoming mono- 
graph entitled: Clinical Aspects and Interpretation of the Strong Voca- 
tional Interest Blanks. Other theoretical and interpretive phases of inter- 
est measurement are treated more extensively in the monograph, to be 
published by the Psychological Corporation. 
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or job success, he claims he “was told he would succeed in 
these jobs.” 

3. Such statements run the risk of flouting countless 
stereotypes, prejudices, specific dislikes, or misconceptions 
evoked by occupational labels, either in the student or in 
his parents. Very few people know what a personnel man- 
ager does, and there are substantial and not always com- 
plimentary stereotypes about Y.M.C.A. workers. The in- 
teresting fact that these labels are directly at variance with 
the student’s claimed choices operates also to set up re- 
sistances in the student, although the student’s own specific 
choices are also hedged around with favorable stereotypes 
which may be equally invalid, and although there is con- 
siderable evidence on the instability and invalidity of the 
student’s claimed choices. 

4. Such statements run the risk of moving the discus- 
sion too early in counseling to the temporarily irrelevant 
factors of opportunities, salaries, prestige values. The 
counselor is forced to waste precious time giving data (if 
he knows of any) on these points before having established 
an understanding of the interest type being discussed. 

5. Such statements fail to take into account the vital 
factors of Jevels of ability and past achievement, which 
determine the level of future academic achievement most 
probably attainable; educational disabilities affecting edu- 
cational progress in a correct curricular and occupational 
area; amounts of relevant specific aptitudes, in addition to 
level of general scholastic ability; and personality char- 
acteristics related to job success or satisfaction. Specific 
patterns of interest unaccompanied by ability and past 
achievement sufficient to permit curricular competition in 
professional schools occur frequently in counseling, be- 
cause of the relatively low general correlations between 
measured interests and measured abilities or achievement. 

Strong has published correlations of each occupational 
key with an intelligence test... On the original blank the 
zero-order correlations range from —.36 to .38. Segel and 
Brintle* collected interest test scores, college grades and 
achievement test scores from 100 junior college freshmen. 
Using interest test scores for the keys for doctor, lawyer, 
life insurance salesman, personnel manager, and purchasing 
agent, they found only one positive correlation above 40 
with selected parts of the Iowa High School Content Ex- 
amination—the correlation between engineering interests 


*See Manual for Vocational Interest Blank for Men, original and re- 
vised blanks. Palo Alto: Stanford University Press. 

2 David Segel and S. L. Brintle. “The Relation of Occupational Inter- 
est Scores as Measured by the Strong Interest Blank to Achievement 
Test Results and College Marks in Certain College Subject Groups.” 
Journal of Educational Research, XXVII (February, 1934), 442-45. 
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and measured achievement in mathematics. Achievement 
in mathematics and science correlated .28 and .29, respec- 
tively with measured interests in medicine. Achievement 
in English literature, science, and social studies correlated 
—.43, —.26, and —.26 respectively with measured interests 
of a purchasing agent. The correlations between subject 
matter grades and measured interests were even lower than 
those between achievement tests and measured interests. 
Grades in mathematics and science correlated only to the 
extent of .14 with interests in engineering, while grades in 
history correlated —.47 with interests in engineering. The 
authors were sufficiently encouraged by these relations be- 
tween scholastic accomplishment and interest test scores 
derived from studying adult occupational groups to suggest 
that “scales for scoring the Strong Interest Tests should 
be devised for the principal subject groups in higher sec- 
ondary education.” However, the obtained correlations were 
so low that the clinician must be extremely careful to keep 
interests and abilities or achievement separate in his own 
thinking, and to see that there is no such confusion in the 
student’s thinking. 

This error in counseling is particularly tragic and in- 
excusable where the occupations being discussed in terms 
of the interest test are those for which society demands 
college training prior to certification for professional com- 
petition. It is equally inexcusable in cases where the occu- 
pation can be entered with or without specific advanced 
training, as in the case of general measured interests in 
business. But in such cases, the counselor can cover his 
error by saying later what he should have explained earlier, 
namely, that in such occupations, success or satisfaction 
in the occupation is still possible even though success or 
satisfaction is not possible in a curriculum which may bear 
some degree of resemblance to the occupation, but which 
is not yet an indispensable prerequisite. This explanatory 
technique can be effectively used in “downgrading” some 
cases. 

6. Such statements also fail to take into account the 
problem mentioned earlier® in regard to the present-day 
representativeness of norm groups, as exemplified in the 
psychologists’ key. =i 

7. Finally, such blunt statements omit consideration of 
possible changes of specific measured interests which, while 
infrequent, may occur under certain conditions. Strong 
states this position clearly: “Prognostication of future be- 
havior cannot safely be based upon the presence or absence 
of any single interest, but it does appear that to a consid- 


>The monograph from which this chapter is taken discusses the repre- 
sentativeness of Strong’s standardizing groups as a factor in interpreta- 
tion of the interest test results. 
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erable degree at least it can be based upon the entire con- 
stellation of interests.”* In the article quoted, test-retest 
correlations of the specific keys ranged from .59 to .84 over 
a five-year interval beginning with the senior year in col- 
lege. 

Furthermore, in using the blank with younger students, 
it is usually more important to determine the interest type 
than the specific occupational interest. Carter, Pyles, and 
Bretnall® have demonstrated the presence of the types at 
the average age of 16.5, whereas Carter and Jones® have 
shown that only 17 per cent of tenth-grade students receive 
specific A scores on the keys appropriate to their occupa- 
tional choices. Thus the counselor who uses the test with 
younger cases must remember that the standardizing pro- 
cedure based on levels of scores made by adults may not 
yield an A score to a high-school student on a key within 
the interest type in which he may have a legitimate and 
dominant pattern. 

With this understood, the test becomes clinically useful 
in the age range from about 15 years and up. But the 
counselor who looks only for single A scores cannot make 
effective use of the test in this age range. This difficulty 
would be clearly eliminated if a technique such as standard 
scores could be used as the reporting device for younger 
cases. Then the higher pattern of standard scores within 
an occupational group would stand out more clearly on the 
individual’s profile, where the letter grade scores, based on 
adult norms, do not show intra-individual patterns so 
clearly in younger cases. 

These statements of the ineffective way to interpret interest 
test scores, and the reasons therefore, grow out of bitter clinical 
experience. There is fortunately a more effective alternative. 
Suppose, in this hypothetical case, no reference is made to the in- 
terest test scores until Jate in the counseling interview. Suppose, 
further, that the counselor draws out of the student, by question- 
ing, the reasons behind the student’s own choices of business, 
engineering, or “executive work.” He will discover much super- 
ficial thinking about jobs, which is in itself important. But he will 
also discover the specific factors leading to the choices: infor- 
mation (or misinformation) regarding salary scales and “over- 


+E. K. Strong, Jr., “Permanence of Vocational Interests,” Journal of 
Educational Psychology, XXV (1934), 336-44. 

5H. D. Carter, G. K. Pyles and E. P. Bretnall, “A Comparative Study 
of Factors in Vocational Interest Scores of High-School Boys,” Journal 
of Educational Psychology, XXVI (1935), 81-98. 

6H. D. Carter and Gary G. Jones, “Vocational Attitude Patterns in 
High-School Students,” Journal of Educational Psychology, XXIX (1938), 
321-35. 
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crowded” or “undercrowded” fields and job duties; satisfaction 
expected from the job; self-estimates of strong and weak abilities 
or subject-matter fields; evidences of family pressures or tradi- 
tions dictating the choices; self-estimates of aspirations and mo- 
tives that are operative in the choices; and evidences of out-of- 
school experiences shaping the choices. 

Suppose, finally, that the counselor is familiar with the “in- 
terest types” or “interest patterns” growing out of factor analysis 
studies.’ The counselor can then direct the questioning at get- 
ting the student to evaluate activities which are related to the 
interest type and which are within the scope of his experience 
with his environment. Questions can also be used to evaluate 
those experiences contra-indicating the type into which the stu- 
dent’s claimed choices fall. 

Specifically, in the hypothetical case, unhappy experiences 
with mathematics would contra-indicate the technological inter- 
est type, in which the claimed choice of engineering is included. 
Participation in Hi-Y work and summer camp jobs may be drawn 
out as bits of evidence in favor of the welfare or uplift type in 
which some of the measured interests fall. A discussion of “exec- 
utive work” as a pervasive problem of dealing with people takes 
it out of the claimed realm of a business activity alone. 

Notice that the student has not yet been informed of his own 
specific measured interests. Notice also that the counselor has 
used the test scores in directing his questions to evoke relevant 
experiences and to clarify the student’s thinking about jobs. At 
or near this point, the counselor will be ready to tell the student 
what his basic interest type seems to be, with some chance of 
getting this idea across by saying: “It seems to me that your 
basic interests are in helping people or in working with them 


TAvailable factor analysis studies establish the qualitatively different 
types of interest patterns somewhat as follows: interest in scientific or 
technological activities; interest in verbal or linguistic activities; interest 
in business contact activities; interest in business detail activities; inter- 
est in welfare or uplift activities. The specific occupational keys for 
which the men’s interest test is scored may be approximately grouped in 
these five categories. To make a clinical determination of the intensity of 
the interest type, the following procedure has been used with experimental 
verification, including tabulations of frequency of occurrence: for an indi- 
vidual student, the primary interest pattern is the interest type within 
which he shows a preponderance (majority or plurality) of A and B+ 
scores on the specific occupational keys; the secondary interest pattern is 
the interest type within which he shows a preponderance of B+ and B 
scores; and the tertiary interest pattern is the interest type within which 
he shows a preponderance of B and B— scores on the specific keys. 
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in an effort to bring about an improved adjustment, rather than 
in technical, impersonal activities, or in piling up a tremendous 
fortune.” Then he may discuss specific occupational duties and 
labels as representatives of the basic type, phrasing his remarks 
somewhat as follows: “These basic interests in helping people 
(or working with people) would be satisfied by the job of a per- 
sonnel manager, for example, who is responsible for...” (and 
then may follow a description of job duties and responsibilities 
and types of training) .... ; or those same interests would find 
an outlet in the type of work that a Y. secretary might do. So 
far as training is concerned, these two jobs require somewhat 
different types of abilities and aptitudes as we can see in studying 
the two curricula involved; therefore, it is important to see how 
your abilities and past achievements line up with the two 
cheiees....” 

In this way the A and B-+ scores are introduced as examples 
of occupational outlets for the interest type rather than rigid 
occupational prescriptions for this student, and due allowance 
can be made for existing curricular differences. 

The advantages of this clinical procedure are obvious. It 
reduces to a minimum the arousal of resistances growing out of 
stereotypes or prejudices which the student may have about the 
occupational label. It permits the counselor, subject to his own 
imagination and knowledge of jobs, to generalize beyond the 
available keys on the blank and classify other occupations within 
the basic interest type, which is valuable when one realizes that 
there are about 20,000 occupational labels and only 36 occupational 
keys on the revised blank for men, and 17 occupational keys on 
the blank for women. It permits the counselor then to discuss 
levels of ability, achievement, and aptitude required for a wider 
range of jobs within the interest type, and thus it permits read- 
justments of the student’s plans in the light of other pertinent 
data about him. It gives the student a clearer understanding of 
the place of interests in making a vocational choice, because the 
counselor can explicate the student’s responses to his earlier 
questions as they relate to an interest type theory. It reduces 
to a minimum any conflict between the student’s specific choices 
and the counselor’s alternative suggestions, since both the spe- 
cific choices and the alternative suggestions are assigned to 
broader categories of interest types, where the student can more 
easily see his own status in regard to types of occupations. 

The clinical effectiveness of this alternative plan of interpreta- 
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tion has been demonstrated in the experiences of graduate stu- 
dents in supervised clinical training, and in the reaction of trained 
counselors to the plan. Students are less prone to misinterpret 
the outcomes of the interview; parents can see more clearly the 
relevance of specific educational and vocational suggestions made 
by the counselors; greater flexibility is possible in working out 
educational and vocational plans; more satisfaction is expressed 
by students with this form of counseling assistance in their voca- 
tional problems. 

No claim of infallibility, however, is made for the plan of 
interest test interpretation. It is not easy to learn, nor will it 
solve certain student problems of inflexible and over-emotional- 
ized or fixated vocational choices. It requires skillful interview- 
ing and careful explanations. 

There are other aspects in counseling on the basis of interest 
measurement that should be mentioned. The absence of a con- 
‘sistently significant correlation between specific occupational 
scores and either ability or achievement has already been men- 
tioned. Yet in these studies certain experimental problems remain 
uncontrolled. Clinicians can cite many cases in which a student 
has substantially improved his college grades when he transfers 
to a curriculum that trains for an occupation which is within his 
basic and primary interest type. Students transferring from 
engineering to business administration, from medicine to journal- 
ism, from chemistry to teaching, and succeeding better after such 
transfers are familiar to all counselors. The grade increment 
cannot be attributed solely to easier academic competition in the 
second curriculum, since the second curriculum may demand no 
less general academic ability than the first, and may demand dif- 
ferent types of special achievements and aptitudes than the first. 

If the interest measurement can be considered an approximate 
quantification of motivational factors, the following experiment 
would be significant. Choose a group of students having a pri- 
mary pattern and a group having a secondary or tertiary pattern 
in the interest type for which a given curriculum offers specific 
occupational training. Match cases from the two groups on the 
basis of scholastic ability. If the primary pattern in the interest 
type denotes more adequate motivation, the group having this 
pattern should earn better grades than its matched group, pro- 
vided no disproportionate factors of disabilities or problems load 
this experimental group. 

Furthermore, when any raw score on an interest key above 
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—.5 sigma can receive the same A grade, there is some question 
of the legitimate use of the Pearsonian correlation in studying 
the relation of occupational interest scores to other and more 
normally distributed variables, such as ability or achievement. 
Correlation ratio or contingency coefficient statistics may be more 
appropriate forms of analysis for these data. It is too early to 
consider occupational interest factors conclusively unrelated to 
curricular factors, in the light of examples to the contrary. 


Conversely some counselors can cite cases in which superior 
or adequate grades are earned in a curriculum that trains for a 
vocation included in an interest group where the student has no 
primary pattern. Yet this need not be too alarming in the light of 
subsequent data about the occupational adjustments of graduates. 
Approximately fifty per cent of all the engineering graduates do 
not continue through life in the technical practice of engineering, 
and the chances are good that many in that fifty per cent have 
primary patterns in interest types other than the technical type. 


This leads to a final clinical apect. The interest type in coun- 
seling must be considered in relation to the local institution’s 
curriculum organization in educational guidance. Examples are 
more clearly seen in terms of the blank for women. Many women 
make a primary pattern in the interest type which includes the 
secretarial and office worker keys. The normal curricular path in 
college may be the highly theoretical and technical economics of the 
existing school of commerce or business administration. Yet only 
a small proportion of college girls want to swallow this large dose 
of abstruse economic theory. The primary interest pattern is still 
a true picture of the occupational activities that would be satis- 
fying prior to marriage; the curriculum may still be excellent 
for professional specialists in business, but the twain probably 
shall not meet happily. General education courses plus a mini- 
mum of training in basic office skills would solve the problem if 
the institution provides such a curricular organization. Otherwise 
a liberal arts education with a short course in a commercial busi- 
ness college during the summers or after school will provide a 
workable solution. 


Other examples will occur to counselors who are familiar with 
the detailed structure of their curricula as well as the curricular 
labels. The general principle of empiric identification of interest 
type consonant with curricular duties will clarify some confusing 
cases for counselors. 
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THE COURSE IN SELF-APPRAISAL AND CAREERS 
OFFERED TO SENIORS IN THE CHICAGO 
PUBLIC HIGH SCHOOLS* 


GRACE MUNSON 
Bureau of Child Study, Chicago Public Schools 


Since February, 1939, seniors in the Chicago high schools have 
been given the opportunity to enroll in a course in Self-Ap- 
praisal and Careers. This course, with its subsequent counseling, 
constitutes the final step in the Adjustment Service. It is the 
culmination of the self-appraisal and educational planning which 
starts early in the elementary grades, is featured in the eighth- 
grade program of articulation between elementary and high 
schools, is an important aspect of the individual counseling at all 


"year levels by high-school teachers in their daily adjustment 


periods, an] is featured again in the third-year program for a re- 
check on mental abilities and reading achievement. These activi- 
ties are given continuity by the cumulative folder system and are 
supplemented all along the way by the individual service and fol- 
low-up studies of both elementary and high-school adjustment 
teachers collaborating with the Bureau of Child Study psycholo- 
gists and demonstrators, for individual cases studies, clinical 
treatment, and consultative service. 

As the Adjustment Service has now been operating in the 
high schools since 1937 and in the elementary schools since 1936, 
cumulative folders of the fourth-year students of September, 
1940, will contain data assembled over a period of three and a 
half years and in some cases longer. Each year the data will 
extend back farther until ultimately the complete school history 
with many successive measures of mental power and achievement 
will be available for the final guidance step. 

Given in the first half of the fourth year, the course in Self- 
Appraisal and Careers enhances and continues the self-appraisal 
of the earlier years by presenting the concepts of mental growth, 
of individual differences, and of the forces of self determination. 
It makes use of a wide range of scientific measuring techniques 
administered by the adjustment teacher or field psychologists for 


* This article is a summary of the description of the course as pre- 
sented in the Superintendent’s Annual Report for the year 1939-40. 
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the identification of specific areas of mental power, aptitudes, 
academic masteries, and vocational interests. And it teaches the 
techniques for interpreting the profiled results and the various 
conditioning factors. The self-appraisal section of the course 
gives each student a foundation in elementary psychology as a 
background for developing the techniques which will enable him 
to make continued self-appraisal as his pattern of powers and 
achievements changes with new growth and new experiences. 
The new understandings and newly accumulated data together 
with that assembled through the years are now used for making 
specific immediate plans and tentative future plans for educa- 
tional, vocational and avocational pursuits. 


Career Study Is Dynamic 


The careers section of the course provides studies in specific 
vocational areas using the most recently published books and 
pamphlet series, compilations of current occupational informa- 
tion, regional conferences with selected speakers from repre- 
sentative vocational areas, personal interviews with these 
speakers, radio broadcasts, and tours. Students acquire knowl- 
edge of the historical development of occupations, their social 
significance, legislative controls and significant trends as a back- 
ground for the development of techniques which will prepare 
them to continue the study of vocations on the basis of the new 
experiences and the new skills that may be acquired in the 
changing and diversified world of work. 

The careers section of the course now lacks roots in earlier 
vocational studies comparable with the early development of 
self-appraisal. The problem of adjusting the high-school curric- 
ulum to accommodate such a course for all students earlier than 
the senior year has been studied with great care, since too early 
selection of vocations is detrimental, yet tentative choices should 
govern to some extent educational planning in high school. 
Beginnings have been made by introducing into selected subject- 
matter courses, study units on the vocational implications of a 
particular subject; books on vocations have been added to the 
free-reading book shelf for the first-year English Reading classes, 
and school libraries contain many valuable sources of vocational 
information; the individual counseling at the eighth-grade level 
and successive high-school levels by adjustment teachers, divi- 
sion-room teachers, and particularly by placement counselors 
involves some future vocational planning with the students; but 
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the students, having had little opportunity to study careers, are 
unable as yet to contribute intelligently their rightful share of 
the planning. 

The development of a program preliminary to the fourth-year 
course is necessary since an earlier tentative selection of a career 
plan will contribute to good mental hygiene by developing secur- 
ity, responsibility, organization of effort, and growth in self 
determination. In this connection, plans are being formulated to 
drop the third-year testing program to the first half of the second 
year, using the New Chicago Tests of the Primary Mental Abili- 
ties which will yield more diversified data as a basis for self- 
appraisal and counseling; to introduce more specific career 
studies in the second-year curriculum together with a study of 
the total high-school organization of courses and facilities; and 
to establish more clearly defined routines to govern individual 
program-making from year to year by division-room counselors. 

In the senior course the psychological studies and the careers 
studies are presented in somewhat parallel order, one vocational 
area being finally selected for intensive study by each student, 
after the results from the psychological measurements have been 
profiled and interpreted by him. A sample profile is presented in 
Figure I. 

There is little attempt to match a given profile pattern to a 
particular vocation since scientific research has not been able to 
map the specific mental abilities required for insured success in 
a given vocation, and since an attempt to sort individual students 
into vocational pigeon holes would violate the fundamental dem- 
ocratic principles of public education. Yet wise counseling com- 
bines with student freedom of choice based on a knowledge of 
self and of careers, to give each student the security of tentative 
but specific plans. 

The teachers of the course assist students in the formulation 
of such plans through individual counseling as the course pro- 
gresses, using their non-teaching periods for this purpose. In 
the following semester each student goes over his plans again 
with the placement counselor if he seeks employment immedi- 
ately following graduation, or with the senior counselor or the 
adjustment teacher, selecting his college and his first college 
courses, if he plans to continue his schooling. Most adjustment 
offices maintain a library of college information and scholarship 
data. The adjustment teachers arrange for senior visiting days 
at the junior colleges, and confer with their personnel staffs for 
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Figure I 
Profile of the test results for a student in self-appraisal and careers. 

Name School Date Period Grade 

TESTS Date Norms P.R. 1.2345 10 20 3040 so 60 10 9s 
MENTAL POWER 
gy ed Mental Abilities 2-40 

1.-Perception (P)........... 75 
2. Number (N)........... 35 
3. Verbal (Eee 95 
4. Space PRD icn cin wenn ee 99 
5. Memory (M)........... 75 
6 aneachon: (7). ..... 66.64 98 
7. Deduction (D)........... 98 
ACHIEVEMENT 
Silent Reading (Iowa-Adv.) 2-40 
1. Comprehension ........... 99+ 
2. Directed Reading ......... 90 
Ts | rT 99 
4. Word Meaning ........... 99 
5. Sentence Meaning .....:.. 97 
G. Para. Meating ........... 91 
Ue SE en 99 

TOTAL COMP.......... 99 


HS-Content (Iowa) 3-40 

1. English, Lit., Grammar... .95 
eee 80 
3. Science (Natural) ........ 85 
4. History (Soc. Studies)... .92 


SPECIAL ABILITIES 
Art Judgment (M-S) 4-40....63 


Music (K-D) 4-40 
5 a 75 
EP ook oso ow Sinigsinis ob Selee 97 
ES a ee 94 
sonal Movement ..........0 94 
Bin Sapiro Sin wes one SSO ee 8 
IR > ils nts si a sess) pico 10 
S550 bres Gunes eases ace 96 
ME SS 559 cGo ces ou a choeee 80 
gs See 93 
Rhythm Imagery ............ 85 
CLERICAL (N.I1.1.P.) 3-40 
1. Oral Instructions ........ 40 
Re 80 
ee eee 70 
OEE Ss os. sos Sas oon ean 80 
Ce eres 90 
SSO rrerecnrgeenr 97 
PEN: cece. ceaaw enone 98 
OU MRES os a,c. 505% x00 00S 92 
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the orientation program and the transfer of data for such stu- 
dents as plan to attend. They also assist in organizing “College 
Day” when representatives from local and state colleges and 
universities present the advantages of their institutions. 

The course in Self-Appraisal and Careers has been organized 
and serviced by the Bureau of Child Study and the Bureau of 
Occupational Research subject to the advice and guidance of the 
Assistant Superintendent in Charge of High Schools. Confer- 
ences with principals have determined policies for the outlines 
and content of the course while the teachers have contributed 
many valuable devices and suggestions. Each semester the stu- 
dents have made constructive criticism to improve the value of 
the course for the next group. 

Since the course is a five-hour major elective it has not been 
accessible to all seniors following the old program of high-school 
studies. The new program, which will begin to operate in the 
next semester and which allows wider choice of electives, will 
permit more students to enroll. The course should eventually 
be made available to every fourth-year student. 

The following table shows the enrollments in successive 
semesters since the course was established in February, 1939: 


Enrollments in Self-Appraisal and Careers 











Calendar No. of Schools | No. of Classes | No. of Students 
February, 1939......... 32 74 2600 
September, 1939......... 32 69 2500 
February, 1940......... 36 80 2800 











The course is taught without a textbook since no high-school 
textbook has been written that covers the psychological studies 
selected for the course and since most textbooks on occupations 
are likely to be out of date by the time they are printed. Instead, 
an extensive bookshelf of reference materials for both teachers and 
pupils is supplied, supplemented by current materials on occupa- 
tions. Students thus have an opportunity to read widely in the 
areas of their interests. To make the books more available to stu- 
dents, several books have been unitized for each school, by divid- 
ing them into from 14 to 43 sections re-mounted in manila covers, 
thus introducing a type of individualized instruction. This year 
a set of 10 reprints on psychological topics, written in popular 
vein by eminent leaders in that field, was supplied in class sets 
to each high school. 

Teachers’ lesson plans and outlines have been worked out and 
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distributed to all of the schools, modified from semester to semes- 
ter in accordance with suggestions from teachers and pupils. 
This year, in answer to the demands, student work sheets were 
prepared to accompany the teachers’ outlines. They were mimeo- 
graphed and supplied in class sets. 


The Outline of the Course 


The outline of the course, arrived at by successive modifica- 
tions in the light of experience, is given below. It will be revised 
still further as experience indicates the directions in which it 
may be more useful to the students and more adequate in fulfill- 
ing its objectives. Lesson outlines for the teacher have been pre- 
pared for all sections of the course. Special study guides have 
been prepared for the use of students in connection with the 
topics which are starred. Units I and II have been prepared by 
the Bureau of Child Study, Unit III by the Bureau of Occupa- 
tional Research, Unit IV jointly. 


Unit I. *Introduction and Bibliography 
A. Aims and activities of the course 
B. Terminology 
C. Bibliography 

Unit II. Self-Appraisal (To be taught simultaneously with Unit 
III. It is suggested that each week, two days be spent on 
Unit II, two on Unit III, and one on testing. Constant inter- 
weaving should be practiced.) 


A. Existence of individual differences 
1. Family history and autobiography 
Racial and cultural background 
Family traits, vocations and achievements 
Health history 
Educational history 
Hobbies 
Social development 
Occupational experiences 
. Plans for the future 
hysical and mental differences between people 
Types of differences 
The total personality 
The normal curve 
Applications to the testing program 
Educational implications 
Vocational implications 


yr Ho BO op 


*2. 


mono op 


* Special study guides have been prepared for the use of students in 
connection with the topics which are starred. Lesson outlines for the 
teacher have been prepared for all sections of the course. 


48 














THE COURSE IN SELF-APPRAISAL AND CAREERS 


g. Chicago’s plan for the study of individual differ- 
ences from the kindergarten through the high 
school 


*B. Uses and limitations of standardized tests 


How accurate are the test results? 
How useful are the test results for prediction? 
Do the tests measure all one’s abilities? 
How can the information from them be used most 
effectively? 
5. Can the tests designate the one particular job for which 
each person is exactly fitted? 
6. Study of tests to be given in this course 
a. Description 
b. Why selected 
7. Chicago plan for the study of individual differences 
and for the development of techniques of self-appraisal 
from fourth grade through high school 


C. Psychological factors that must be considered in the inter- 
pretation of test results 
1. Maturation and change 
*a, The process of growing up 
(1) Physical and mental growth 
(2) Laws of natural growth—infancy to maturity 
(3) Influence of the environment on growth 
(a) Effect of frustrations 
(b) Effect of social environment 
(4) Adolescence 
(5) Maturity—the learning ability of adults 
*b. Individual control of the direction of growth 
(1) Habits: our masters or our servants 
(a) Conditioned response 
(b) Deliberate reconditioning 
(2) Development of work habits 
(a) Urge to mastery and completion 
(b) Urge to self-direction 
2. Mastering our environment 
*a. Human drives and obstacles 
(1) The basic drives i 
(2) Psychological bases for the emotions 
(3) Motives derived from basic drives 
(4) The complexity of motives 
(5) Motives as products of the environment, plus 
psychological factors 
(6) The universality of obstacles 
(7) Drives in career planning 
*b, Mastery and adjustment—interaction of the indi- 
vidual and the environment 


a ol ak 3a 





* See footnote on preceding page. 
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(1) a adjustment: mastery of one’s prob- 
ems 
(a) Adjustment of environment to self 
(b) Adjustment of self to environment 

(2) Less successful types of adjustment 

(3) Self-appraisal and the choice of adjustment 

(4) Relation to educational and vocational planning 


D. Individual interpretation of the test results 
1. Necessary statistical concepts 


*a, 
b. 
c. 
d. 


Percentile rank 
Mean 

Median 
Quartile 


2. Profiles of test results to be made by the student 


b. 


c. 


Construction 
Interpretation of test data as samplings 
Comparison of abilities and achievements 


3. Aids in interpretation of individual performance on 
each of the following tests: 


*Q 


*b, 


¥¢ 


+d. 


¥e, 


Thurstone Primary Mental Abilities 

American Council on Education Psychology Ex- 
amination 

Iowa Silent Reading Test, Advanced 

National Institute of Industrial Psychological Cler- 
ical Examination 

Cleeton Vocational Interest Inventory 


4. Aids in interpretation of the completed profile 
Unit III. Careers and Occupations (Simultaneously with Unit II) 
A. Man’s interdependence in work 
1. The growth of interdependence 


aaonn 


e. 


. Primitive methods of work 

. Development of specialization 

. Effect of specialization 

. Discussion of our present highly specialized work- 


ing world 
Release of human energy for cultural service, and 
leisure-time activities 


2. Evolution and importance of occupational groupings 


ao op 


Development and significance of the merchant guilds 
Development of craft guilds 

Later history of craft guilds 

Present day significance 

(1) Employee organizations 

(2) Employer organizations 

(3) Trade associations 

(4) Professional organizations 


* See footnote on page 48. 
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3. Socio-economic factors in the study of an occupational 
area 
a. Questionnaire study 
b. Basic attitudes toward occupational rewards other 
than money 
4. Legislation affecting workers 
a. Social Security—old age insurance 
b. Social Security—unemployment compensation 
c. Wage and hours laws 
d. Child labor laws 


B. Significant relations and trends in occupations 

1. Classification of occupations 

a. Importance of study of occupational areas in a broad 
sense as well as of specific occupations 

b. Occupational areas vs. occupational fields 

2. Significance of trends in occupations 
a. Technological 
b. Commercial 
c. Personal and domestic 
d. Professional and semi-professional 


C. Study of an occupation 
1. Relationship between the school subjects and occupa- 
tions related to those subjects 
Graphs of life earnings 
Case study of an individual 
Intensive study of several selected occupations 
(check list or outline for occupational study) 
5. Intensive study of several selected avocations 


D. Techniques in securing and holding work 
1. Channels in finding work 
2. Written application for work 
3. Making an interview 
4. Adjusting to a job 

Unit IV. *Summary 

A. Development of techniques for self-guidance 
1. Summarizing self-appraisal data 
2. Summarizing data for study of occupations 

B. Schedules for counseling during.the ensuing semester 
1. Functions of placement counselor 
2. Functions of other counselors available in school 
3. Appointments 

bis Application of data to the solution of individual problems 
and the formulation of two plans—one a tentative long- 
range plan, and one a specific plan for immediate action, 
both to include provisions for education, vocation, and 
avocation 


> Hp 


* See footnote on page 48. 
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D. Evaluation of the course 

Student Work Sheets have been prepared to implement the 
outline of Units I, II, and IV. The topics are listed below. Over 
2,000 copies of each were distributed to students during the year 
1939-1940. 

1. Introduction and Bibliography, 17 pp. 

Physical and Mental Differences, 6 pp. 
Standardized Tests, 7 pp. 
Process of Growing-Up, 15 pp. 
Self-Directed Personality Change, 10 pp. 
Human Drives, 14 pp. 
Mastery and Adjustment, 10 pp. 
Meaning of Percentile Rank 
Primary Mental Abilities, 5 pp. 
Mental Power as measured by A.C.E. Test, 4 pp. 
Reading Ability as measured by the Iowa Silent Read- 
ing Test, Advanced, 5 pp. 
Clerical Ability as measured by the N.I.I.P. Test, 4 pp. 
Vocational Interest, as indicated by the Cleeton Voca- 
tional Interest Inventory, 7 pp. 
14. Summarizing Self-Appraisal Data, 2 pp. 


Battery of Tests for Self-Appraisal Used in 1939-1940 


I. Mental Tests 
A. *Thurstone Tests for the Primary Mental Abilities 
1. Perception 
Memory 
Number 
Space 
Verbal 
Inductive reasoning 
Deductive reasoning 
B. *American Council on Education Psychology Exami- 
nation 
II. Reading Ability 
A. *Iowa Silent Reading Test, Advanced 
III. Achievement Tests 
A. *Iowa High School Content Examination 
B. American Council on Education Cooperative General 
Achievement Test 
1. Mathematics 
2. Science 
3. Social Science 
IV. Aptitude Tests, as desired 
A. Clerical 
*National Institute of Industrial Psychology Clerical 
Test, American Revision 
* Percentile norms have been prepared by the Bureau of Child Study 
on Chicago groups. 
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B. Mechanical 
1. Detroit Mechanical Aptitudes Examination 
2. Individual Manipulation Tests 
C. Musical 
1. Kwalwasser-Dykema Music Tests 
2. Seashore Measures of Musical Talent 
D. Artistic 
Meier-Seashore Test for Art Judgment 


V. Miscellaneous 
Cleeton Vocational Interest Inventory 
B. Business Education Council Personality Rating Sched- 
ule 
C. Kuder Preference Record 


Bookshelves for Students and Teachers 
The following bookshelves have been set up for students and 
teachers: One set has been furnished to each high school. 
References for Students 


*Blatz, William E. The Five Sisters. New York: W. Morrow and 
Company, 1939. 

Psychological Pamphlets (30 sets to each school) 
*(Reprints from a series of radio lectures published under the title 
of Psychology Today by the University of Chicago Press, 1932.) 





Garrett, Henry E........... Psychology Today 

Goodenough, Florence ..... Child Development 

Gesell, Arnold ............ Growth of the Infant Mind 

Watson, John B............ How to Grow a Personality 
Allport, Floyd H........... Personality in Our Changing Society 
Cannon, Walter B.......... Effects of Strong Emotion 

Warden, Cath Ji... 6 ccc. desis Animal Drives 

Robinson, Edward S........ Learning and Forgetting 
Thorndike, Edward L...... Effects of Rewards and Punishments 
O'Rourke, Le Jc... oc cess Matching Men and Occupations 


Occupational Books 

+Brewer, John M. Occupations. Boston: Ginn and Company, 1937. 
¢Chapman, Paul W. Occupational Guidance. Atlanta: Turner E. 
Smith and Company, 1937. 

+Clark, Harold F. Life Earnings. New .York: Harper and Brothers, 
1937. 

+Fleischman, D. E. An Outline of Careers for Women. Garden City: 
Doubleday, Doran and Company, 1935. 

Giles, I. K. Occupational Civics. New York: Macmillan Company, 

_ 1936. 

*Lyons, George J. and Martin, Harmon C. The Strategy of Job Find- 
ing. New York: Prentice-Hall, Inc., 1939. 


* Books added during 1939-1940. 
+ Books which have been unitized. 
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National Resources Committee. Technological Trends and National 
Policy. Washington, D. C.: United States Supt. of Documents, 1937. 
Occupational Outlines on America’s Major Occupations. Chicago: 
Science Research Associates, 1940. 

Rosengarten, W. Choosing Your Life Work. New York: McGraw- 
Hill, Inc., 1936. 

United States Dept. of Commerce Census of Business: 1935. Wash- 
ington, D. C.: Bureau of the Census, January, 1937. Also Census 
of Retail Trade, 1936. 

+Williamson, E. G. Students and Occupations. New York: Henry 
Holt Company, 1937. 

+ Ziegler, S. H. and Wildes, Helen J. Choosing ar Occupation. Phila- 
delphia: John C. Winston Co., 1937, revised edition. ; 


Occupational Pamphlets 
American Job Series. Chicago: Science Research Associates, 1700 
Prairie Avenue. 19 occupational monographs. 
Are There Opportunities for Women? 1936. 10 pamphlets. Changing 
Patterns in Occupations. 1936. 26 pamphlets. New York: National 
Federation of Business and Professional Women’s Clubs, 1819 Broad- 
way. 
Occupational Pamphlets. New York: National Occupational Confer- 
ence. A series of appraisals and abstracts of available literature. 57 
pamphlets. 
Occupational Research Reports. Chicago: National Youth Adminis- 
tration of Illinois, Merchandise Mart. 29 pamphlets. 
Occupational Briefs. Briefs compiled by the National Youth Adminis- 
tration on the occupations included in the reports above. 
Guidance Leaflets. Washington, D. C.: United States Printing Office, 
1936. 19 pamphlets. 
Success—Vocational information series. Directed by Chloris Shade, 
Joliet Township High School. Chicago: Morgan-Dillon and Company. 
55 pamphlets. 

Bibliographical Helps 
Bennett, Wilma. Occupational and Vocational Guidance—A Source 
List of Pamphlet Material. New York: H. W. Wilson Company, 1936, 
revised edition. 

*Massachusetts Youth Administration, Bibliography of Occupational 
and Apprenticeship Information. Boston: 31 St. James Avenue, 1937. 
101 pp. Comprehensive list of magazine articles. 

Parker, Willard B. Books About Jobs. Published for the National 
Occupational Conference by the American Library Association, Chi- 
cago, 1936. 

Price, Willodeen and Ticen, Zelma E. Index to Vocations. New York: 
H. W. Wilson Company, 1936, revised edition. 

Bibliography of References on Vocational Guidance for Girls and 
Women. United States Office of Education. Washington: Vocational 
Division, 1936, revised, 13 pp. Lists bibliographies, studies and in- 
vestigations. 


* Books added during 1939-1940. 
+ Books which have been unitized. 
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Vocational Guide. Chicago: Science Research Associates. A monthly 
bibliography of occupational books and articles. 

Research Services 
Occupational Card File on Current Local Data. Chicago: Bureau of 
Occupational Research, Board of Education. 
Cumulative Bulletin Series. Chicago: Bureau of Occupational Re- 
search, Board of Education. 
Special Research Reports. Chicago: Placement Clearance Center, a 
division of the Bureau of Occupational Research, Board of Education. 


References for Teachers 

Psychological Books 
Bingham, Walter V. Aptitudes and Aptitude Testing. New York: 
Harper and Brothers, 1937. 
Paterson, Donald O., Schneidler, Gwendolen G., and Williamson, E. G. 
Student Guidance Techniques. New York: McGraw-Hill, Inc. 1938. 
Shaffer, Lawrence F. The Psychology of Adjustment. Boston: 
Houghton, Mifflin Company, 1936. 
Strang, Ruth M. Role of Teacher in Personnel Work. New York: 
Teachers College, Columbia University, 1936. 

Occupational Books 
Lincoln, Mildred E. A Short List of References on Methods of 
Teaching Occupations. New York: National Occupational Confer- 
ence. Mimeographed, 3 pp. Free upon request. 
Lincoln, Mildred E. and Brewer, John M. How to Teach Occupa- 
tions. Boston: Ginn and Company, 1937. 


The Reactions of Students 


It is too early to obtain an adequate evaluation of the course. 
If the opinion of the students is a criterion (and who doubts that 
it is an important component?) the course is highly successful. 
The reactions of students indicate their deep sense of responsi- 
bility at this level of high school training, the changes in their 
viewpoints engendered by the course, and their gratitude both for 
the new knowledge acquired, and for the personal guidance from 
the fine men and women who have taught the course. A few of 
the student comments are presented below: 

“I feel I have benefited by almost every topic and dis- 
cussion in this course in Careers, but several parts have been 
very helpful. I enjoyed all the tests, and the experience of 
having had them helped me when I applied for a position at 
the Continental Bank. Four tests were required and one was 
almost identical to those we have been taking. All the tests 
at the bank were somewhat like those we have been taking 
and I was much more confident than I would have been, if 
the work had been new to me. Another part of the course 
that I feel has been of great help to me has been hearing 
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about various occupations. It is much easier to decide upon 
a vocation for yourself, one that you think you will like, after 
you hear the good and bad points about many vocations. I 
have a much clearer idea of what I would like to be in the 
future than I had before I took Careers.” 

Gloria K., Hirsch High School 


“The survey of different vocations and professions has 
been most helpful to me. It never occurred to me that one 
vocation could branch into so many specific fields. The 
Careers class threw light on many subjects concerning which 
I was in the dark, such as the present and future demands in 
the labor market, the demand of the employer upon his em- 
ployee, and the amount of education needed to get along in 
a vocation.” 

Frances D., Harper High School 


“This course of Self-Appraisal which you are offering is 
very good in building character, citizens, and regular ladies 
and gentlemen. Now I don’t say this just to be your good 
friend because it is everything that I mentioned above and 
more. 

“One of the things that struck me most was the way you 
treat the pupils. Because there is nothing like having a regu- 
lar guy talking to another regular guy. 

“After three and a half years of bumming, cutting, etc., 
this course brought me to my senses. I don’t know what it 
was — whether it was the tests, or the homely philosophy — 
but the course was interesting. 

“And now in ending I want to thank you for making this 
change in me. And later in life I’ll come up and give you a 
visit. Maybe I’ll be a bum on Madison Street or a big shot 
on Michigan Boulevard. I’ll always come and visit the regu- 
lar guy and at the same time ask him for advice.” 


Ted T., Harrison High School 


“Very few of those who finish high school ever sit down 
and take an inventory of themselves. The talks students 
have with the counseling teacher make you gather your wits 
about you and make you think of how to approach your em- 
ployer. Those who are backward and bashful come out of 
their shell, due mostly to the reassurance of the teacher who 
gives them a boost upward.” 

Margaret I., Marshall High School 


“The most important part of the Careers course is the 
making of the Career book on the selected occupations, be- 
cause it helps you find out all about the occupation and to 
make sure you will fit in that line of work. After making my 
book on careers in pattern making, I found I needed to brush 
up on a few technical things. Some students found out that 
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they never will or could fit in the occupation they had first 
chosen, and they have had time to make a better choice.” 


Chester L., Steinmetz High School 


“Of all the courses I have had, one of the most important 
subjects for me and for the development of my character, has 
been “Self-Appraisal and Careers.” It is a subject which we 
have to give a lot of thought to, with quite a bit of brain 
work. It has been helpful in many ways: (1) it makes one 
think fully on the future when he or she leaves education and 
goes into the open world; (2) it helps one to know people 
and to understand them much better; (3) it helps one dis- 
cover the vocation into which he will best fit; (4) it helps one 
get a clearer view of the world and of occupations.” 

Kathleen M., Waller High School 


Handbooks and Bulletins 


More complete information concerning the course in Self- 
Appraisal and Careers will be found in mimeographed bulletins 
available from the Board of Education in the City of Chicago. The 
bulletins may be obtained for the cost of mailing (75c) by writing 
to the Bureau of Child Study, 228 N. La Salle St. 


Prepared jointly by the Bureau of Child Study and the Bureau 
of Occupational Research: 


Handbook on Self-Appraisal and Careers, 17 pp. 
Teachers’ Outlines for Self-Appraisal and Careers, 86 pp. 


Prepared by the Bureau of Child Study: 


Student Work Sheets for Self-Appraisal and Careers, 106 pp. 
Handbook of Norms, 30 pp. 

Handbook on Scoring Procedures, 36 pp. 

High-School Teacher’s Devices and Suggestions (Subject: Self- 
Appraisal and Careers. A Bulletin issued by the Superintendent of 
Schools), 19 pp. 

Bureau of Child Study Annual Report, Part V, High-School Self- 
Appraisal and Careers Course, June, 1939; 9 pp. 

Service Bulletin No. 1, 1940, Methods of Presenting the Course in 
Self-Appraisal and Careers. 


Prepared by the Bureau of Occupational Research: 


- Cumulative Bulletin Series: 

Series I Educational Facilities, 30 pp. 
Series II Occupational Information, 23 pp. 
Series III Significant Trends, 3 pp. 

Series IV Pertinent Legislation, 11 pp. 
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PRIMARY MENTAL ABILITIES AND AVIATION 
MAINTENANCE COURSES* 


WILLARD HARRELL, University of Illinois 
and 
RICHARD FAUBION, Air Corps Technical Schools 


This investigation is the third of a series designed to deter- 
mine the optimal pattern of abilities for mechanical work. The 
first study, “A Factor Analysis of Mechanical Ability Tests” (1) 
suggested that the principal component of the Minnesota series of 
mechanical tests is the Space factor. A second factor, tentatively 
identified as the Perceptual, was present in that battery. A Man- 
ual Agility factor was also isolated. None of the Minnesota tests 
possessed a significant weight for this Agility factor. The most 


‘practical conclusion from this first study was that certain paper 


and pencil tests will measure equally validly each of the factors 
present in more cumbersomely-administered mechanical tests. 


The second study, “Selection Tests for Aviation Mechan- 
ics (2),” consequently involved only paper and pencil tests. This 
second study was started after the publication of Thurstone’s 
monograph, “Primary Mental Abilities (3),” but was begun before 
his Experimental Battery of Primary Mental Ability Tests (4) 
became available. Nine of the tests from the monograph supple- 
ment were included along with 29 other sub-tests. These were 
taken by 84 basic instruction students of the Air Corps Technical 
Schools. Basic instruction grades from each of five aviation 
maintenance courses with a total duration of eight weeks formed 
external criteria. These course grades were the criteria for both 
the second study and for the third, the subject of this paper. 


Air Corps Technical School students take these five basic 
instruction courses regardless of later specialization in radio, 
photography, airplane mechanics, parachute rigging or other 
advanced specialties. The five basic courses are Shop Mathemat- 
ics, Mechanical Drafting and Blueprint Reading, Air Corps Fun- 
damentals, Elements of Metalwork, and Elements of Electricity. 
The names are perhaps sufficiently definitive except for two of 


* This report is of a study sponsored jointly by the Trade Test Depart- 
ment, Air Corps Technical Schools, and the University of Illinois’ Gradu- 
ate Research Committee. The paper was read at the Mid-Western 
Psychological Association, May 4, 1940, at the University of Chicago. 
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the courses. Air Corps Fundamentals is unlike the others in that 
it does not entail mechanical problems. It is made up of the 
study of Air Corps rules and nomenclature. Shop Mathematics 
includes the following topics: addition, subtraction, multiplica- 
tion, and division; fractions and decimals; denominate numbers 
and mensuration; formulas and tables; shop trigonometry; ap- 
plied problems. 

Entrance to the Air Corps Technical Schools is restricted to 
soldiers in the United States Army. Consequently the minimum 
age is 18 years. The education requirement is graduation from 
high school or the equivalent. A minimum Army Alpha percentile 
rank of 75 is required. Percentiles here are based on the Army 
population. The percentile rank of 75 corresponds to an Otis 
1.Q. of 100. 

One hundred and five soldiers, students of the Air Corps 
Technical Schools, were given Thurstone’s Experimental Battery 
of Primary Mental Ability Tests (4), and two additional tests 
found predictive in the previous study (2) — Surface Develop- 
ment and Punched Holes. Army Alpha scores were also available 
since they are used as an entrance requirement. About half the 
group was in the advanced phase of Airplane Mechanics, and the 
other was in Radio Mechanics. The age range was 18-39 with a 
mode of 19. The range for years of formal schooling was 9-15. 
Sixty-four had completed high school but had gone no further. 

The classification of students into the various advanced phases, 
as well as their selection, might be considered a test problem in 
part, but only the selection angle will be considered here. Re- 
sults are becoming available from tests given to 600 students to 
provide sufficiently large samples to trace the correlation between 
tests and several advanced phases. 

It is recognized that the course grades are not perfect criteria. 
They are complex, but since they consistently correlate signifi- 
cantly with several tests, they probably possess a reasonable 
amount of validity. One objective criterion—a machine shop prod- 
uct—has been developed which is hoped to have a satisfactory 
reliability. Other practical criteria and objective information 
criteria are planned. 

The reliability has been estimated by the split-half method for 
each of the sub-tests correlating as high as .30 with a criterion. 
These coefficients are shown in Table III. 

Only two of the seven Alpha sub-tests with significant correla- 
tions with any grade, namely Addition and Analogies, have a 


60 








be cote 

















ee 


ee — 











PRIMARY MENTAL ABILITIES AND AVIATION 


reliability above .90. Alpha Arithmetic with reliability of .60 
is lowest. 


Three of the Primary Mental Ability tests, Completion, Arith- 
metic, and Number Series have reliabilities of less than .90 but 
more than .80. From an item analysis, showing the correlation of 
each item in the PMA battery with total sub-test score, the 
relatively low reliability in these three cases is probably due 
in part to the items not being arranged in order of their difficulty. 


Four of the PMA tests, Addition, Same-Opposites, Cards, and 
Figures, have reliabilities above .97. These high reliabilities may 
be partially explained by the items within each of the tests being 
practically of equal difficulty. 


Comparison of A. C. T. S. Students with High School Seniors 


A comparison has been made between the PMA scores for Air 
Corps Technical School students and the norms published for 300 
Hyde Park High School (Chicago) seniors. Table IV shows this © 
comparison. Critical ratios have been calculated from the differ- 
ences between means. CR’s for Number and Memory are less than 
.30. Hyde Park seniors have higher Perceptual, Verbal, and~ 
Induction scores. Air Corps Technical School students have 
higher Reasoning and Space scores. - 


It is difficult to interpret these results because it is not pos- 
sible to say exactly what selective agents are at work in the Air 
Corps Technical School. The most obvious ones are, being a 
soldier, choice by a commandant which presumably means interest 
in mechanical work, completion of high school, and having an 
Alpha Army percentile rank of 75. 


A difficulty with the Reasoning or D score is that one of the 
tests, Mechanical Movements, on which the D score depends, also 
possesses a significant weight in another factor which from an 
unpublished factor analysis by the writers seems to be Knowl- 
edge of Mechanical Processes. Since’‘the present group is 
selected in part on their interest and, presumably, knowledge of 
mechanical processes, this would increase the Mechanical Move- 
ments score and consequently the D score, without demonstrat- 
ing that they are better reasoners than the Hyde Park seniors. 


Results with Primary Mental Abilities Tests 


All of the PMA scores were obtained from adding test scores. 
Five of the seven, all but Perception and Memory, correlate sig- 
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nificantly with at least one of four basic instruction grades. 
Table I lists the product-moment correlation coefficients. A 
significant correlation is considered to be one where the coeffi- 
cient is at least four times its probable error. For 105 cases this 
is a correlation of .24. 

Elements of Metalwork does not correlate significantly with 
any of the tests, but it did in the second study referred to above. 
The test correlations with Shop Mathematics and with Mechan- 
ical Drafting appear quite similar to those in the previous study. 
In both groups, Addition, Number Series, and Surface Devel- 
opment correlated significantly with Shop Mathematics; and in 
each study Mechanical Movements, Surface Development, and 
Punched Holes with Mechanical Drafting. There are stronger 
correlations with Electricity, and with Air Corps Fundamentals 
in this study than in the second. A possible explanation is that 
the present battery has more Verbal tests; and these correlate 
significantly with each of those two courses. More important is 
that the greater dispersion for age and schooling of the present 
group tends to increase the correlations. 

Looking again at Table I, the Number factor correlates sig- 
nificantly only with Shop Mathematics; Space correlates signifi- 
cantly with Shop Mathematics, and with Mechanical Drafting; 
Induction with Shop Mathematics, Electricity, and Mechanical 
Drafting; while Reasoning and the Verbal score correlate sig- 
nificantly with each of the four basic grades. 

Mutiple correlation coefficients using only significant zero- 
order coefficients have been computed between PMA scores and 
each of four basic grades. These may be compared with correla- 
tions of Alpha total with the four criteria grades. The multiple 
R’s from the factor scores are: .46 with Shop Mathematics; .57 
with Electricity; .60 with Mechanical Drafting; and .36 with Air 
Corps Fundamentals. Corresponding values for Alpha total are 
31, .47, .30, and .41. These multiple R’s, as well as others to be 
mentioned later, would be expected to be less in other samples by 
the shrinkage effect if the same regression formulas were used. 

The multiple correlation between four factor scores, Verbal, 
Space, Induction, and Reasoning, is .63 with a composite basic 
grade obtained from adding grades in Shop Mathematics, Elec- 
tricity, and Mechanical Drafting. Alpha total correlates .45 with 
this same composite. The zero order correlations with this com- 
posite grade are given in Table VI. Table V shows the inter- 
correlations of five factor scores. 
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Results are shown in Table II for those sub-tests which cor- 
relate .30 or more with one of the basic grades. This is five times 
the probable error. 


Conclusions 


The Air Corps Technical Schools are planning to supplement 
their test selection in line with these results; and they also expect 
to establish test standards for classification from future studies. 


We have come to the conclusion from this and other studies 
that there is no one separate factor for a mechanical ability. 
Rather, there are several factors which are more or less prom- 
inent in mechanical work, their pattern depending on its type 
and complexity and on the point reached in the learning curve. 


A Perceptual factor, although present in several so-called 
Mechanical Aptitude tests, is probably related to mechanical 
work, borrowing an expression from Holzinger, as an Arti-factor. 
The Verbal factor has been shown to be evident in training for 
mechanical work of relatively great complexity. Among the more 
important factors in mechanical operations are Space, one or 
two Reasoning factors, and Knowledge of Mechanical Processes. 
A Manual Agility factor is present in routine jobs where indi- 
vidual differences depend on the manipulation of objects such 
as nuts and bolts. 


TABLE I 


Product-Moment Correlation Coefficients Between 5 “Primary Mental 
Ability” Scores and 4 Aviation Maintenance Courses for 105 Soldiers* 








Blue Print | Air Corps 
Shop Elec- | Reading and| Funda- 
Math.| tricity |Mech. Draftg.| mentals 





14 N (Addition, Multiplication)....... 37 £7 00 11 
17 V (Completion, Same-Opposites). .. .28 51 37 33 
20 S (Cards, Figures)................ 25 17 36 02 
27 I (Letter Grouping, Marks, 

Number Patterns)............ 33 29 41 20 
31 D (Arithmetic, Number Series, 

Mechanical Movements)....... 26 40 54 24 


PE, = .05 where r= .50 
PE, = .06 where r= .20 














* Decimal points have been omitted before each coefficient. 
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TABLE II 


Product-Moment Correlation Coefficients Between 19 Tests and 4 
Aviation Maintenance Courses for 105 Soldiers* 




















Blue Print | Air Corps 
Shop Elec- Reading and| Funda- 
Math.) tricity (Mech. Draftg.| mentals 
SS UND dwn cccaseensoun 33 06 —10 —04 
| ee 21 35 30 27 
3. Alpha Common Sense............. 13 31 06 06 
4. Alpha Word Opposites............. 13 43 24 38 
5. Alpha Mixed Sentences............ 17 43 23 34 
6. Alpha Number Series.............. 37 15 24 07 
7s Paes PONS... as 26 35 23 39 
12. Thurstone Addition............... 39 23 08 24 
15. Thurstone Completion............. 23 47 39 21 
16. Thurstone Same-Opposites......... 27 45 30 34 
eo a ee 23 18 32 00 
19: Thurstome Figures................ 21 12 33 05 
24. Thurstone Letter Grouping........23 25 31 22 
26. Thurstone Number Patterns....... 30 18 32 05 
28. Thurstone Arithmetic............. 31 49 49 29 
29. Thurstone Number Series.......... 22 33 39 17 
30. Thurstone Mechanical Movements. .08 12 40 10 
32. Thurstone Punched Holes........ 15 16 41 06 
33. Thurstone Surface Development... .35 21 50 02 





* Decimal points have been omitted before each coefficient. 








TABLE III 
Test Reliabilities by the Split-Half Method (Stepped-up) 
t N = 103 

1. Alpha Addition .......... 98 19. Thurstone Figures ....... .99 
2. Alpha Arithmetic ........ .60 24. Thurstone Letter Grouping .91 
3. Alpha Common Sense..... 87 26. Thurstone Number 
4. Alpha Word Opposites... .88 RONEN. os cae Wiese 0 92 
5. Alpha Mixed Sentences... .80 28. Thurstone Arithmetic .... .87 
6. Alpha Number Series..... 84 29. Thurstone Number Series. .87 
7. Alpha Analogies ......... 93 30. Thurstone Mechanical 
12. Thurstone Addition ...... .98 Movements ............ 93 
15. Thurstone Completion ... .80 32. Thurstone Punched Holes. .89 
16. Thurstone Same-Opposites .99 33. Thurstone Surface 
18. Thurstone Cards ......... .99 Development .......... 95 
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TABLE IV 


Comparison Between 300 Hyde Park High School Seniors and 105 Air 
Corps Technical School Students in “Primary Mental Abilities” 
































Hyde Park A.C. T.S. 
Seniors Students CR 
Mean S.D. Mean S.D. 

re 152 23.75 137.09 16.51 7.04 
RE ee aera ee 119.5 30.00 118.78 27.80 0.22 
a aan Weert 84.5 27.50 75.16 19.95 3.72 
NS. eer sh aw wh owes 109.5 35.00 125.23 34.14 4.04+- 
OS ee eee 15.5 7.25 15.65 7.67 0.18+- 
re 35.5 9.00 28.46 8.76 7.04 
Reasoning....... Eeeeeeeee 54.5 18.75 68.85 18.48 6.69 + 

TABLE V 


Product-Moment Correlation Coefficients 
Among Factor Scores for 105 Soldiers 











Ni| vist 
Vv 31 | 
Se | ae) ae | 
I | .33 | .28 | .39 
D | .20 | .33 | .41 | 54 
TABLE VI 


Product-Moment Correlation Coefficients Between Tests and a Composite 
Basic Grade Composed of Shop Math., Electricity, and 
Mechanical Drafting 








N = 105 
1. Alpha / "rae | | 19. Thurstone Figures ....... .29 
2. Alpha Arithmetic ........ 34 CEI 5b Sieg ea eka gies KG tia .34 
3. Alpha Common Sense .... .17 24. Thurstone Letter Grouping .34 
4. Alpha Opposites ......... Ai 26. Thurstone Number 
5. Alpha Mixed Sentences ... .32 WIGCUOUNE aia. oc. 8 o:c.04 emesis .36 
6. Alpha Number Series .... .35 27. EMOUGUOR ea ic.e 6c cicia ee cess 44 
7. Alpha Analogies ......... ao 28. Thurstone Arithmetic .... .53 
PGMA TOUR. 5... sees 45 29. Thurstone Number Series.. .39 
12. Thurstone Addition ...... .29 30. Thurstone Mechanical 
DA POET, ioc secceccsces .23 Movements ............ .26 
15. Thurstone Completion .... .44 31. Deduction ............... .50 
16. Thurstone Same-Opposites .41 32. Thurstone Punched Holes. .31 
MS PERT 6 i5n 84.0.5 8 o50:0 0-0 Oia 47 33. Thurstone Surface 
18. Thurstone Cards ....... a eee Development .......... 47 
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A COMPARISON OF THE ORIGINAL AND REVISED 
STANFORD BINET INTELLIGENCE SCALES 


MARTIN L. REYMERT AND RALPH K. MEISTER 
The Mooseheart Laboratory for Child Research 


The present study is an attempt to compare the original and 
the revised Stanford-Binet Intelligence Scales. The data have 
been obtained from 440 Mooseheart children, each of whom has 
had from two to nine examinations. The population of tests 
comprises 958 administrations of the original scale and 823 
administrations of the revised. The testing was done by trained 
clinical psychologists of the Laboratory staff. The children are 
all normal and have been drawn from every state in the Union, 
predominantly from the Middle West. 

The following items were recorded: The child’s name, birth- 
date, the date of administration of the test, the I.Q. rating 
obtained, the M.A., the C.A., the basal year score, the highest 
level of success and the amount of scatter. The time interval be- 
tween administrations and the direction and amount of deviation 
from the first I.Q. rating to the second were obtained for each 
pair of successive administrations. 

To compare the equivalence of ratings from one scale to the 
other with the respective reliabilities of the scales, using the 
same population, two groups of children were chosen. All had 
had at least two examinations with the original scale and two 
with the revised. However, Group A of Table I had taken the 
L form of the revised scales first while Group B had taken the 
M form first. 


TABLE I 


Correlations Between the Various Forms of the Stanford-Binet Scales 
for Constant Populations 





























: Group A || Group B 
Scales Correlated 0:0;| OL | LM | 0:0; | O.M | ML 
N 84 84 | 84 41 41 | 41 
r 83 -86 | -90 -90 | -69 | 89 
Av. Age at First Test (in years) 8.9 10.4 | 11.8 9.3 10.3 | 11.4 
Av. Int. between Tests (in years) 1.6 1.3 | 1.2 1.0 1.1 | 1.2 
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The reliability coefficients for the original scale in both 
groups are above .80. So are the correlations between the two 
forms of the revised scales which have been considered here as 
analogous to reliability coefficients. The correlations between 
the original scale and each form of the revised, which give an 
estimate of the equivalence of ratings, are not significantly dif- 
ferent! although the correlation with the M form is lower. Thus 
it can be said that the reliabilities for both scales and the correla- 
tions between both scales are essentially high and equal. 


Table II, which shows the correlations between the original 
scale and the two forms of the revised, and their respective reli- 
abilities, using all the population that was available with no 
attempt to keep the composition of the groups the same, gives 
estimates that are all high with but one exception. 


TABLE II 


Correlations Between the Various Forms of the Stanford-Binet Scales 
for Populations of Variable Composition 
































Scales Correlated | Or O: | Li Lx | [Ms M»| OL | OM | LM | ML 
N | 118 iS 85 | 44 | 146 | 116 | 89 
¢ 80"| .85 |] 89 | 82 a 88 | 60 
Av. Age at First Test (in years) | 10.3 r 9.7 = 10.0 | 11.8 2a 12: 10.9 | 10.3 
Av. Int. between Tests (in years)| 1.3 | 1.9 | 1.9 | 2.8 | 2.7 | 1.2 | 1.0 





* An administration of an alternate form was included between the 
two forms correlated. Therefore these correlations are not between suc- 
cessive administrations as are the others. 


The estimate of correlation between the M form and the L 
form of the revised scale is significantly lower than any of the 
other estimates. However, since an estimate of this same correla- 


1No P.E.; is given in this study since an improved technique (Rider, 
10 pp. 84-85.) has been used to determine whether the difference between 
correlation coefficients is significant. The use of the P.E., is a crudely 
approximative method at best and in this particular case it is erroneous 
since the assumption of a normal distribution of correlation coefficients 
is probably violated in this case. 
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tion obtained in Group B (Table I) is high, .89, the lower coeffi- 
cient here, may be due to the particular sample taken. 

Another view of the equivalence of ratings from scale to 
scale can be obtained from a study of the deviations from admin- 
istration to administration. The results presented in Table II, 
where the grouping is according to I.Q. classification, show that 
within each scale and between scales the individuals with the 
lowest I.Q.’s gain most upon retest, those of average I.Q. gain 
some, while those of highest I.Q. actually lose. 


TABLE III 


Deviations Between Ratings in Successive Administrations According 
to I.Q. Classifications 
























































Scales 
Administered | hts Or LM or none . or -” 
Below | 90 to | 110 and | Below | 90 to | 110 and | Below | 90 to | 110 and 
I. Q. Level 90 110 above 90 110 above 90 110 above 
N | 197 | 328 | 66 83 | 197 124 94 | 193 | 50 
N (pos.) | 108 | 155 | 22 ss | 123 | 60 69 | 133 | 19 
Nineg.) | 76 | 158 | 42 20 | 59 | 60 23 | ss | 31 
M (pos.) | 6.9 | 65 | 7.8 7.8 | 6.6 | 5.0 91 | 99 1.6 
M (neg.) | 4.1 | s4 | 91 | 44 | 38 | 6.1 4.2 | 5.2 | 3.0 
|| | 5.3 | s7 | 84 | 65 | 5.3 | 5.4 7.7 | a3 | 28 
M l+o2 14.5 |-32 [444 l430 |-.s s6 | 46 | -13 











We have here the expected tendency for the extremes of the 
distribution to migrate toward the mean with successive retest- 
ings (regression). 

In Table IV where the deviations are classified according to 
the length of interval between tests, the mean of the absolute 
deviations increases as the interval becomes longer in every case 
but one. 

In that case, this reversal of tendenty may be discounted in 
view of the small number of cases (10). It is concluded that the 
longer the interval between successive administrations, the 
greater the discrepancies in the ratings. 

Table V, which gives for both scales the relation between 
the size of deviation and the number of tests taken, shows that 
the mean of the absolute deviations decreases slightly with suc- 
cessive tests for the revised scale and does the same for the 
original scale with one exception. 
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TABLE IV 
Deviations in Ratings According to the Length of Interval Between 
Administrations 
Scales O: O: LM or ML OL or OM 
Administered 
Interval Less | 1 year | 2 years Less | 1 year 2 years Less 1 year | 2 years 
between testing) than to and than | to and than to and 
(in years) 1 year | 2 years| above | 1 year | 2 years| above | 1 year | 2 years above 
N | 95 460 34 95 | 297 | 10 25 164 | 151 
N (pos.) | 53 | 228 7 56 | 17s | 3 15 | 109 | 111 
| | 
N (neg.) | 35 | 210 26 33 | os | 3 9 ss | 35 
M (pos) | 7.2 | 67 | 4.6 64 | 6.7 | 5 8.9 | 8.3 | 10 
M (neg.) | 4.8 | 5.6 | 7.2 4.2 | 4.9 & 4.9 | 5.3 & 
| mt | | 5.8 | 5.9 | 65 5.2 | 5.7 c 7.2 | 7.2 | 9 
M l+o2 |+.8|-s6 [+es |+e6 |- +36 [+38 | +69 
TABLE V 


Deviations in Ratings Between Successive Administrations in Relation 
to the Number of Tests Taken 















































| 
| 
Original Scale | Revised Scale 
Deviations ! | 
O1 O2 Oz: Os: Os; Ox. QO; Os Li M2 M: Le 
or Mi Le or L2 M: 
N 244 | 166 103 | 64 133 | 69 
N (pos.) 109 | 87 52 | 28 88 | 32 
N (neg.) | 124 | 65 48 | 34 38 | 33 
M (pos.) | 8.3 | 6.1 5.4 | 5.7 | 6.4 | 48 
M (neg.) | 6.1 | 5.8 4.7 | 5.2 | 5.4 | 3.9 
| m | 6.8 | 5.5 5.1 | 5.3 5.8 | 4.1 
| 1 
“Sh eR eR ere mar. i os 














In general, with continued retesting, the discrepancies between 
successive administrations tend to become slightly smaller. 


Table VI, which presents the deviations according to the 
chronological age of the individual at the time of the first test, 
shows a different trend of deviations with age for each scale. 
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TABLE VI 


Deviations in Ratings According to the Age of the Individual 
at the First Test 







































































Scales 
Administered O:1 O2 LM or ML ‘ OL or OM 
Below | 8 to Above | Below | 8 to Above | Below | 8 to Above 
Age Level 8 10 10 8 10 10 8 10 10 

N 205 | 210 | 157 69 | 103 | 226 36 | 100 200 

N (pos.) 97 92 78 42 66 126 17 | 62 | 165 

N (neg.) | 102 | 108 68 21 32 | 87 18 37 40 
M (pos.) 6.2 | 9.0 | 73 | 80 | 6.6 | 6.1 6.2 | 9.1 10.1 
M (neg.) 6.0 | 5.5 | 5.0 | 5.7 | 4.2 | 2.5 63 | 47 | 4.8 

| m | 59 | 6.7 | 58 | 66 | 5.5 | 43 | 61 | 7.4 8.8 
_oM | + ea 1 a8 8 eel ae! oe 3 | 39 6.9 
With the original scale, the mean of the absolute deviations 
| 


is a maximum in the middle age group; with the revised scale, it 
decreases with age; and, between the original and revised scales, 
it increases with age. In considering the net gain upon retesting, 
it is found that for the original scale there is a small increase in 
net gain with age. For the revised scale there is a decrease in 
the amount of gain, and between the original and the revised 
\ there is a substantial increase in net gain with age. In no instance 
is there a net loss. 


}. Changes in Dispersion 





| In studying changes in dispersion of the I.Q. distributions 
from test to retest in order to estimate how well the test will 
discriminate between members of a group upon retest, it was 
thought desirable to keep the population in any particular com- 
parison constant to avoid any change in dispersion due to a 
change in the composition of the group. Four groups were used. 


Groups A and C show the changes in dispersion on the orig- 
inal scale with one and two retests respectively; Groups B and D 
} do the same for the revised scale. With successive administrations 
of the revised scales, the standard deviations decreased and in 
Group D this decrease was significant. This is what might be 
expected since there should be a regression toward the mean 
upon retesting. To the extent that there is regression a given test 
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TABLE VII 
Changes in the Dispersions of I.Q. Distributions with Retesting 





























| 
Group | A | B Cc | D 
Scale O: | O: | Lior | Mzor} O: | O2 | Os | Lior | Mror| Lior 
| M: | Ls: | | | Mi | Le | Ms 
N ae 9 | 14 165 | 165 | 165 | 127 | 127 | 127 
M 98.0 | 100.4 & 100.4 | 103.1 | 93.7 | 93.7 | 94.5 | 100.5 | 103.4 | | 104.7 
re) 13.6 | ya0| 147] 144 11.0 | 124 | 1233 | 15.1 | 14.4 | 14.0 











discriminates less well between the members of a group upon 
retest. 

However, in the original scale, the standard deviation on the 
first test distribution is smaller and significantly smaller than 
those of either the second or third administrations. This is true 
for both Groups B and D. These latter results may seem con- 
trary to expectation, but it should be remembered that these 
retests occurred a year later on the average and thus the child 
was a year older. It is known that there is an increase in variabil- 
ity with mean test performance. In other words, as children 
grow older they tend to be more variable. The operation of this 
factor tended to mask the predilection for the distribution to 
regress toward the mean in the original scale, while in the revised 
scale the regression toward the mean was sufficiently great to 
obscure the opposite tendency. From a practical standpoint, then, 
it appears that with a given group, the discriminal ability of the 
original scale increases slightly upon retest while that of the 
revised scale decreases. 

In the investigation of scatter, this term will be defined as the 
number of age levels through which an individual had to be 
tested to obtain his rating, from the level at which he passed all 
tests to and including the one at which he failed all. His scatter 
as defined above is larger than his range of successes by one age 
level. Scatter is used here as an approximate measure of the time 
taken to administer the test.? In general, the more levels over 
which an individual scatters, the longer it takes to administer the 
test. 

The amount of scatter is limited by the number of age levels 


*A more direct measure of the time required, such as the use of a 
stop-watch, could not be employed since this study was obtained from 
records which did not contain such information. 
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present in the test and by the age level at which the subject 
obtains his basal year score, i.e., an individual getting a basal 
score at Year XII on the original scale cannot scatter more than 
four test groups since there are no more. For this reason, the 
amount of scatter for both scales has been analyzed according 
to the basal year scores obtained. This arrangement makes 
explicit any limitation of scatter by the ceiling of the test. 


TABLE VIII 
Amount of Scatter Classified According to the Basal Year 
Scores Obtained 





“Basal Year | 















































| AG | | 

Scoreof | MI,/1V| v | vi | vir vir] ix | x | xr | xm = XIV 
Individuals | | | | | 
ish eal NO WR Be Se ee 

i N | 68 | 46/97} 119 | 157 | 130 | 91 | 199 | — | 42 | —]| 0 
Original | | | | | 

Gants Mean] 5.2 | 5.1 | 5.5 | 5.6 | 5.8 | 5.6 | 5.1 | ai | 4.0 | -|- 
Revised | | 12 | 19 | 25 | 7” | 68 | 115 | 87 | 81 | 79 | 58 | 62 | 106 
Scales |Mean| 6.1/ 5.3/5.5! 5.9 | 7.0 | 73 | 7.4 | 68 | 65 | 6.0 | ss | 48 








Table VIII shows that the scatter increases to a maximum in 
the middle range. The maximum scatter is at basal year VII for 
the original scale and basal year IX for the revised scale. It is at 
these points that the ceiling of the test begins to limit the amount 
of scatter. Since there are fewer test groups at the higher age 
levels in the original scale, it might be expected that this ceiling 
would make its influence felt earlier. This is the case. The re- 
vised scale has the greater scatter throughout, probably as a 
result of the increased number of tests in it. At basal age seven, 
this difference in scatter which has been only slight increases 
somewhat. 

According to these results it would seem that the revised scale 
in general takes a longer time to administer. This is in agreement 
with the results reported by Krugman (6). No evaluation can be 
made of this finding, since it is not knawn to what extent the 
longer testing time results in increased accuracy of the rating 
obtained. 

Inversions in Basal Year Scores 

Inversions in basal year scores, i.e., instances in which an 
individual on a later test makes a lower basal year score than on 
his first, were studied since they cast some doubt upon the assump- 
tion that an individual would answer correctly all those items 
below his basal year level. In the original scales such inversions 
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occur in four per cent of the total possible instances (24 times 
out of 589). In the revised scale, they occur in nine per cent of 
the total possible instances (35 times out of 412). This difference 
is not statistically significant. 


Inversions in Item Level 

Inversions in item level or skips, i.e., instances in which the 
individual failed all the tests at one age level but succeeded on 
one or more tests on a higher level were noted. The assumption 
in testing is that the individual will not succeed in any test 
beyond the level at which he fails all. This assumption and the 
pressing demand of time economy militate against testing the 
child beyond the level at which he fails all the tests so that these 
skips are not so frequent as they might otherwise be. To the 
extent that the child is not given opportunity to perform on tests 
at a higher level where he sometimes achieves a random success, 
his rating is an underestimation of his true ability. 

In the original scale such skips occurred in four per cent of 
the tests (40 times out of a possible 958). In the revised scales 
they occurred a little less than one per cent of the time (eight 
times out of a possible 823). The difference between these propor- 
tions is significant. Apparently the grouping of tests, from the 
viewpoint of avoiding such skips, has been much better in the 
revised scales. 


Validity of Mental Year Groupings 
To test whether the grouping of test items by mental years is 
such as to represent for each year’s grouping the normal per- 
formance of children of that chronological age, the basal year 
scores were analyzed according to the average age of children 
achieving those scores. Table IX, giving the mean chronological 





























TABLE IX 
Average Ages of Children Making Various Basal Year Scores 
ig es Brag ee mae gay (meng pe 
\Mr)Iv) v | vi | vir | vir) m | x Fac | acer | amr | acrv 
| w | 67 si | a3 | 1s | 144 | 124 | 74 | 116 | | a | | 
Original ——|——_— —— 
-_ | | 
| | 50/es/ 72] s0 | oo | a9 | 129 | 1980 | | 13.9 | | 
Scale } 
| o frofrafa2| a6 | as | us | as | 22 | oe | 
| N | 9 [15 | 23 | 72 | s7 | 107 | 87 | #1 | 71 | se | 64 | 92 
Revised |——___—_ " | | | 
| M | 4.6 | 5.6 | j6| ss | 10.6 | as | 12.2 | 13.1 | 1309 | 14.4 | 15.7 | 16.4 
Scales |——!——_-__ | 
ie lm liolen| we as laa lant as les lao | te 
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ages of children making the various basal year scores, shows that 
the mean ages are in every case significantly higher than the year 
level indicated by the score. 

Within the same basal year group, the mean age for the 
original scale is in no case significantly different from the mean 
age for the revised scale, indicating that though both scales do 
not meet the foregoing criterion for the grouping of test items, 
one is no better than the other. 

Summary 

The data were gathered from 440 norinal children who had 
taken a total of 958 original and 823 revised Stanford-Binet exam- 
inations. The results indicate that the reliabilities for both 
scales are high, over .80, and the correlations between scales are 
comparably high. In both scales, children with low I.Q. tend 
to gain more upon retests than do the children of average I.Q. 
while those of above-average I.Q. actually tend to lose upon 
retesting. For both tests, as the interval between successive 
administrations increases, so do the discrepancies between the- 
test ratings. For both scales, as more tests are taken, the discrep- 
ancies between later tests tend to be smaller than those between 
earlier tests. 

In the original scale the mean of the absolute deviations is a 
maximum in the middle age range; for the revised scale it de- 
creases with age. 

For the original scale there is a small increase in net gain 
with increasing age. In the revised scale there is a decrease in 
the amount of gain. 

The dispersion of I.Q.’s and therefore the discriminal ability 
of the test increases with successive tests on the original scale; 
on the revised, however, the dispersion decreases. 

The scatter on the revised scales is greater and reaches its 
maximum later than on the original scale. 

Inversions in basal year scores are more frequent in the 
revised scale while skips are more frequent i in the original scale. 
Basal year test groupings on either test do not represent the 
normal performance of children of the corresponding age but 
rather of children a year or two older. 
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THE PREDICTION OF SCHOLASTIC SUCCESS IN A 
COLLEGE OF MEDICINE 


DEWEY B. STUIT 
University of Iowa 


The prediction of scholastic success in the professional colleges 
is a major personnel problem and one of primary significance to 
the individual, the colleges and society as a whole. Satisfactory 
achievement in the professional courses is the first step toward 
vocational success. Unless an individual can perform satisfactorily 
the work required for a professional degree, the question of ultimate 
vocational success need not be raised. 

If the individual can be informed of his chances for success in a 
professional college before he enrolls it should be of great advan- 
tage to him in terms of time and money saved if he should other- 
wise fail. At the same time it should encourage those who possess 
the necessary ability to make the sacrifices which may be involved. 
The net results should be a better adjusted individual and a more 
competent profession. While these statements apply to all profes- 
sional colleges, they seem particularly pertinent to medicine because 
the period of training is long, the expense to the individual is con- 
siderable, and the welfare of society demands highly competent 
medical men. The present investigation was undertaken to throw 
some light on the problem of predicting success in this professional 
area. 

Specifically, it was the purpose of this study to investigate the 
value of liberal arts grade point averages and certain aptitude test 
scores as predictive indices of success in first year medicine at the 


State University of Iowa.1 Because of the variations in grading 


standards at different institutions only those students who com- 
pleted all of their undergraduate work at the University and who 
had complete records for one year of work in medicine were included 
in the study. Prior to 1938 standards for admission to the College 
of Medicine required at least two years or 60 semester hours of 
work in an approved college of arts and sciences; after 1938 this 


*The writer wishes to express his appreciation to Dean E. M. Mac- 
Ewen of the State University of Iowa College of Medicine for making 
available the basic data and to Mr. C. William Applegate, research assist- 
ant in educational personnel, for his contribution to the statistical analyses 
made in the study. 


77 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


was changed to three years or 90 semester hours. Also in 1938 the 
required grade point average in liberal arts work was raised from 
2.00? to 2.20. The restrictions imposed made it necessary to select 
students from entering classes as far back as 1934. The number and 
percentages selected from each class are presented in Table I. 


TABLE I 


Number and Percentage of Students Selected from Various 
Freshman Classes in the College of Medicine 














Year | Total Enrolled* | No. Used | Percentage 
CGR ERE aaa 34 30.08 
rice eecka kai 5 <ks s 121 40 | 33.05 
es etree Sf | 112 21 18.75 
Me Oe ees oxnwaes | 104 33 31.73 
er Cr oo | 55 14 25.45 
ESAS Scrape ere ae 505 eee | 28.12 





* The total number enrolled in each class includes students completing 
their liberal arts work at other institutions in whole or in part, those 
registered as freshmen for more than one year, and those who withdrew 
in the course of the year. 


The predictive indices available for this group of 142 students 
included Iowa Qualifying Examination scores, Moss Medical Apti- 
tude Test scores, and grade point averages for liberal arts work. 
The Iowa Qualifying Examination, administered to all entering 
freshmen, consists of the lowa High School Content Examination, 
‘Iowa Silent Reading Test, Iowa Mathematics Aptitude Test, and 
the English Training Examination. A composite score, consisting 
of a weighted raw score total, is computed for the group of four 
examinations and is used as the score in the Iowa Qualifying Ex- 
amination. 

The purpose of this examination is to assist counselors in their 
advisory work with students and to predict the scholastic success 
of undergraduate students in various colleges and curricula. The 
Moss Medical Aptitude Test is administered to applicants for ad- 
mission to colleges of medicine by the American Association of 
Medical Colleges. In considering the liberal arts work, the total 
grade point average, the “required science” and “total science” 
grade point averages were studied separately. Required science, as 
distinguished from total science, includes 32 hours of prescribed 
courses. The specific subjects prescribed in the liberal arts curricu- 


* Grade point averages or point hour ratios are computed by consider- 
ing A= 4, B= 3, C=2, D=1, Fd — 0. 
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lum are inorganic chemistry through qualitative analysis, quantita- 
tive analysis, elementary organic chemistry, elementary physics, and 
biological science, usually zoology. 

The criterion of success in first-year medicine consisted of the 
student’s grade point average at the close of the academic year. 
Makeups for subject conditions and incompletes were disregarded. 
It was felt that the grade first assigned in a course should be used 
because it represented a better appraisal of the student’s perform- 
ance in comparison with his fellows. The same practice was fol- 
lowed in computing grade point averages for the second year of 
work, 

The raw scores in the aptitude tests had been converted to per- 
centiles and were thus recorded. It was assumed that these percen- 
tiles were equivalent from year to year. For computational purposes 
the percentiles were converted into linear scores by the use of 
Hull’s table. 


Student Performance 


The performance of the students in the aptitude tests and liberal 
arts work is shown in Table II. The mean linear score of 44.50 in 
the Moss Medical Aptitude Test is equivalent to a percentile score 
of about 40 on nation-wide norms. Data were also available for 240 
additional students who did not meet all of the criteria used in the 
selection of the 142 students included in this study. It will be noted 
that the mean linear score for this group is slightly higher, but the 
range is almost identical. In the Iowa Qualifying Examination and 
its sub-tests, the group is definitely superior as indicated by the 
mean linear scores, but the range is very wide, varying from the 
seventh to the ninety-ninth percentile in the composite score. The 
mean grade point average of these students in liberal arts work is 
also definitely superior. The average point hour ratio for the Col- 
lege of Liberal Arts is about 2.20, while the students in this group 
achieved a 2.60 average. From these data one might conclude that 
the typical student who goes into medicine at Iowa is definitely 
superior in the Iowa Qualifying Examination and in his liberal arts 
work, but he may be somewhat below the average in the Moss Med- 
ical Aptitude Test. 

The second phase of the study was concerned with the relation- 
ship between the various predictive indices and scholastic success 
in first year medicine. The coefficients of correlation expressing 
these relationships were computed by the product-moment method 
and are presented in Table III. 
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TABLE II 
Student Performance in the Aptitude Tests 
and 
Liberal Arts Work 
Predictive Index N M o Range 
Moss Medical Aptitude. ......141 44.50 15.05 Linear Score 13-83 
. Percentile 3-96 
Moss Medical Aptitude....... 382* 46.00 16.05 Linear Score 0-87 
Percentile 0-98 
Iowa Qualifying Exam.: 

1. Composite Score........ 142 62.29 25:57 Linear Score 21-91 
Percentile 7-99 
2. High School Content..... 142 62.90 15.85 Linear Score 9-91 
Percentile 2-99 
3. Math. Aptitude.......... 142 62.65 15.94 Linear Score 21-91 
Percentile 7-99 
4. English Training......... 142 56.59 15.65 Linear Score 9-91 
Percentile 2-99 
5. Silent Reading........... 139 58.75 16.28 Linear Score 22-91 
Percentile 8-99 

Required Science............. 142 2.62 -453 1.50}-4.00 

Total Bciesce... .. . os... sess 142 2.60 .398 1.72f-4.00 

Total Liberal Arts............ 142 2.60 .376 1.81{-3.61 

cp 8: 142 2.36 .619 0.53 -3.84 

Fresh. Med. P.H.R...........382 2.33 643 0.53 -4.00 

Fresh. Med. P.H.R...........112T 2.45 524 1.57 -3.84 

Soph: Ried; PALR..........:. 1127 2.16 -528 1.03 -3.63 














* A supplementary study was made of 382 students. 

+ Students of the group of 142 who completed two successive years. 

$ The 2.20 requirement was in effect in 1938. Previous to 1930 this had 
been 1.50, and was then raised to 2.00. A few students admitted in 1930 
did not enroll until 1934. Only one had a total grade point average 
below 2.00. 











TABLE III 
Correlation of the Predictive Indices with the Criterion 

N y P.E., 

The Iowa Qualifying Examination: 
Ty a me ae 142 -098 -056 
High School Content Examination................. 142 -058 -056 
Mathematics Aptitude Test....................... 142 -108 -056 
English Training Examination..................... 142 -025 -056 
I i io ois a's vine ne wie iano tines aseia 139 -075 -056 
Moss Medical Aptitude Test........... Site vins sep ark 142 .226 -054 
Moss Medical Aptitude Test.....................50055 382 316 -031 

Liberal Arts Grade Point Averages: 
RE es eee 142 -419 -046 
ee ere wir ae ie cat niate ge 142 465 -045 
tes Ma NES WEE ons ic ee obs oe ceo 142 -449 .045 
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PREDICTION OF SUCCESS IN A COLLEGE OF MEDICINE 


Inspection of Table III reveals that the Iowa Qualifying Exam- 
ination correlates very low with success in first year medicine, that 
the correlation between the Moss Medical Aptitude Test scores and 
the criterion is hardly significant and that the total science average 
is most closely associated with success in medicine as measured by 
first year grades. Examination of the scatter-diagrams provided 
several clues which may explain certain of the correlations. In the 
Iowa Qualifying Examination only 27 students received linear 
scores below 50, hence seriously restricting the range of talent of 
this group. As a result a majority of the students are concentrated 
in the first and second quadrants, those in the second quadrant hav- 
ing received high scores in the qualifying examination but achieving 
below average in first year medicine. Much the same picture is pre- 
sented for each of the sub-tests comprising the qualifying examina- 
tion. The data suggest a critical linear score of 40 or 45 in the com- 
posite score of the qualifying examination, for only eight students 
with qualifying scores below a linear score of 45 succeeded in mak- 
ing a 2.00 average or better in first year medicine. 

The scatter-diagrams of the Moss Medical Aptitude Test present 
a striking contrast to those of the Iowa Qualifying Examination. A 
significant proportion of the students who score low in the test do 
very well in freshman medicine. As shown in Table II, the average 
grade in first year medicine is 2.36 and in the Moss test the mean 
linear score is 44.50. A total of 29 students or slightly over 20 per 
cent scored below average in the Moss test, but made grades above 
2.40 in first year medicine. The student scoring lowest in the apti- 
tude test succeeded in making a 2.60 grade point average. Poor 
performance in the Moss Medical Aptitude Test does not appear 
to indicate with a high degree of certainty that the student will do 
poorly in medicine ‘at this institution. The scatter-diagrams for the 
382 students present a similar picture. 

Of the indices computed from the students’ liberal arts records, 
the total science grade point average correlates best with scholastic 
success in medicine. However, there are some extreme deviates who 
reduce the magnitude of the coefficient of correlation. For example, 
one student with a 2.00 liberal arts record made a 3.35 grade point 
average in medicine while another with a 3.00 record in liberal arts 
work made only a 1.50 average in medicine. In general, however, 
there is rather close agreement between the grades in the two cur- 
ricula. It does not appear that the liberal arts science record is 
superior in predictive capacity to the student’s general average in 
undergraduate work. When the total science average and Moss 


81 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Medical Aptitude Test scores are combined as predictive indices, 
the multiple correlation is .494. Apparently the Moss test does not 
add greatly to the predictive capacity of the science grades. 

Supplementary evidence concerning the relation between the 
Moss Medical Aptitude Test scores and liberal arts grade point 
averages on the one hand and success in medicine on the other is 
furnished in Table IV. With one exception the mean grade point 
average in liberal arts work increases as the level of achievement in 
medicine increases. This is also true of the mean score in the Moss 
Medical Aptitude Test but the trend is not as pronounced. It is 
interesting to note that the range of performance as regards pre- 
dictive indices is about the same for all levels of achievement. This 
makes the low correlations less surprising. 


TABLE IV 
Moss Medical Aptitude Test, Mean Linear Scores and Liberal Arts Mean 
Grade Point Averages for Various Levels of Achievement 
in Freshman Medicine 








Freshman 

Medics | Moss Aptitude Required Science Total Science Total L. A. Work 
P.HLR. | N | M|o |Range M | o | Range | M o | Range | M | o | Range 
3.00-4.00) 24) 52.17; 14.93| 22-83) 2.93) .51 |2.00-4.00) 2.92) .49 |2.00-4.00 2.93} .35 |2.16-3.61 
2.50-2.99| 34 40.59) 14.61) 13-73 2.66, -39 |1.88-3.38) 2.64) .34 |2.00-3.22 2.60) .24 |2.11-3.12 
2.00-2.49| 42 47.60) 13.06) 19-71) 2.65) .40 |1.63-3.50 2.63) .28 |2.00-3.22 2.65) +22 |2.05-3.50 
1.50-1.99) 30, 40.28 13.22) 15-68) 2.37, .35 eeoqrere 2.32) .33 aplengoaped | 2.35) .34 |1.81-3.31 
0.00-1.49) 12! 37.17| 15.92) 19-76 2.25) -18 1.75-2.75| 2.29! .16 !2.00-2.63! 2.28! .16 |2.00-2.58 


TABLE V 
Student Persistence in the College of Medicine 
at the State University of Iowa 


‘One Year | Two Years | Three Years | Four Years 












































~ Year Entered | 
- aa eee EM eR Se She 
1934 a ae 30 29 27 
1935 40 | 35 32 32 
1936 | 21 | 19 | 17 
1937 ae | | 28 
1938 | 14 | 





In order to ascertain whether generalizations made concerning 
first year medicine would apply to other years, the freshman and 
sophomore grades for 112 students were correlated. The resulting 
coefficient was .722. It also seemed desirable to know if the students 
who complete one year of work continue beyond that point. The 
results are presented in Table V and seem to warrant the conclu- 
sion that the persistence of students beyond the freshman year is 
very high. It also appears that the first year’s work is strongly 
indicative of later success in the medical school. 
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PREDICTION OF SUCCESS IN A COLLEGE OF MEDICINE 


The results of the present study agree very well with those found 
at other institutions in this region. At Minnesota® the correlation 
between Moss Aptitude Test scores and freshman honor points was 
found to be .27 for the class of 1938 and .22 for the class of 1939. 
Liberal arts grades for these same classes showed correlations with 
freshman grades in medicine of .57 and .46 respectively. In a study 
made of the classes entering the University of Illinois* in 1932 and 
1933 the correlations between liberal arts averages and achievement 
in first year medicine were found to be .49 and .41 respectively. 
Comparable correlations for the liberal arts average in science were 
.57 and .42. The Moss Medical Aptitude Test was administered to 
the class entering in 1932 and correlated to the extent of .42 with 
first year medicine. Not all the reports on the prediction of success 
in medicine published in the Journal of the Association of Amer- 
ican Medical Colleges are in agreement with these findings. Some 
report the Moss test as being superior to the liberal arts grade 
point average in predicting success in medicine while others find 
the reverse to be true. Apparently all agree, however, that aptitude 
tests and the undergraduate grade point averages furnish informa- 
tion which is valuable in selecting students for medical colleges. 


Conclusions 


The data seem to warrant the following conclusions for the pop- 
ulation included in this study or populations which are similar: 

1. Liberal arts grade point averages are the best predictive 
indices of success in first year medicine. Required science, total 
science and total liberal arts work are of about equal value in this 
respect. 

2. The correlation between the Iowa Qualifying Examination 
scores and grades in freshman medicine is very low. However, the 
data do suggest a critical score which might be used by counselors 
in their advisory work with students who are interested in medicine 
as a career. 

3. In this institution the Moss Medical Aptitude Test does not 
predict the student’s level of achievement with high precision. Stu- 
dents scoring low in the test may do very well in medicine. 


3J. W. Cavett, A. T. Henrici, and S. B. Lindley. “Tests of Medical 
Aptitude at Minnesota.” Journal of the Association of American Medical 
Colleges, XII (September, 1937), 257-68. 


4 George R. Moon. “Study of Premedical and Medical Scholastic Rec- 
ords of Students in the University of Illinois College of Medicine.” 
Journal of the Association of American Medical Colleges, XIII (1938), 
208-12, 
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4. To predict success in medicine with greater accuracy will 
require tests which distinguish more clearly between various levels 
of ability. It is also possible that the criterion of success will need 
to be defined more precisely. 

5. The counselor should not rely solely upon aptitude test 
scores and grade point averages in advising students about their 
probable success in medicine. Perhaps average achievement in apti- 
tude tests and liberal arts work plus high interest and motivation 
will insure the student’s scholastic success. 
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GIVEN AT THE UNIVERSITY OF CHICAGO* 
A COMPARATIVE STUDY OF FRESHMAN WEEK TESTS 


WILLIAM M. SHANNER 
Civil Aeronautics Authority 
and 


G. FREDERIC KUDER 
Social Security Board 


One of the most crucial problems that confronts the educator 
of today is that of correctly advising students as to their education 
and vocational careers. In conjunction with this problem, educa- 
tors and psychologists have constantly worked to secure more 
reliable and accurate information for use in counseling students. 
Almost every college and university now has an orientation week 
at which time all incoming students are required to take batteries 
of psychological and placement examinations, the results of 
which are used in advising students relative to their educational 
programs. 


In September, 1938, a comprehensive battery of psychological 
and placement tests was administered the incoming freshman 
the reiationship between these tests and succeeding academic 
class at the University of Chicago with a view toward studying 
achievement at the university. Among the tests administered the 
freshman group were the sixteen sub-tests of the American Coun- 
cil on Education Tests for Primary Mental Abilities, Experi- 
mental Edition; the 1938 Form, College Edition of the American 
Council on Education Psychological Examination; the College 
Entrance Examination Board’s Scholastic Aptitude Examination; 
a physical sciences aptitude test; a social sciences aptitude test; 
Pressey’s Special Reading Test, Form A; Pressey’s Test on Read- 
ing Comprehension, Form A; and a vocabulary test. The physical 
sciences aptitude, social sciences aptitude, and vocabulary tests 
were locally constructed. 


The first two years of the University of Chicago are devoted 
to a program of general education. The curriculum includes four 
introductory survey courses in the following fields: biological 


* This study was made while the writers were with the Board of 
Examinations at the University of Chicago. 
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sciences, humanities, physical sciences, and social sciences, and a 
number of elective second-year or sequence courses in specific 
subjects. The typical student takes two of the four general 
courses during his freshman year and the two remaining courses 
during his sophomore year. His program each year is completed 
by one or more of the other courses offered in the college. The 
general courses extend throughout the school year, and achieve- 
ment is measured at the close of the year by means of a six-hour 
comprehensive examination. Attendance at a course is not re- 
quired. The only requirement is the successful passing of the 
comprehensive examination. Many students are advised, upon the 
basis of their performances on the freshman week examinations, 
to attempt a comprehensive examination without taking the 
course. Since all students entering the University of Chicago 
as freshmen are required to take the four general courses, educa- 
tional advisers are confronted with the problem of selecting the 
most appropriate general courses for the educational program of 
the student and advising him as to whether he needs additional 
assistance or should attempt the comprehensive examination with- 
out taking the course.* 

By June, 1939, 501 of the freshmen entering the University in 
September, 1938, had taken one or more of the comprehensive 
examinations for the four survey courses and various sequence 
courses. The grades of the comprehensive examinations are re- 
ported in terms of derived scores having a mean of 20 and a 
standard deviation of 4. The average examination grade of each 
student was found by adding the derived scores for all his com- 
prehensive examinations and dividing by the number of 
examinations. 


Test Scores and Average Grades 


Table I reports the correlation between the various freshman 
week tests and average examination grades. The testing time of 
each examination is also reported. The social sciences aptitude 
test has the highest correlation with average grades (.575). The 
physical sciences aptitude, the American Council Psychological 
Examination, and the College Entrance Examination Board’s 


* For a comprehensive description of the organization of the first two 
years of the University of Chicago, see Chauncey Samuel Boucher, and 
A. J. Brumbaugh. The Chicago College Plan. (Chicago: The University 
of Chicago Press, June, 1940), pp. xiii-413. 
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Scholastic Aptitude Examination have just slightly lower corre- 
lations. All these tests, with the exception of the CEEB Scholas- 
tic Aptitude, require approximately one hour of testing time 
each; the CEEB examination requires two hours’ time. 

The social sciences aptitude test is essentially a reading test. 
It consists of three selections, one each drawn from the fields of 
economics, sociology, and political science. Each paragraph is 
followed by a number of questions based upon an understanding 
of the materials covered. The test is a revision of a test given 
experimentally the previous year. 

The physical sciences aptitude test consists of (1) a section 
on vocabulary in the field, (2) questions involving the interpre- 
tation of mathematical formulas, and (3) a reading test containing 
chemistry and physics selections. It is the product of a process 
of analysis and revision carried out over a period of years. 


The results of the 16 tests of the Primary Mental Abilities 
battery are reported in terms of seven composite scores, each an 
approximation to a factor. Scores for the following abilities are 
reported for the test: 


Perceptual. This ability, measured by the verbal enumeration 
and identical forms tests, may be described as one’s facility in 
finding detail which is significant to him or detail which he is 
seeking. 


Number. This factor consists of facility with simple numer- 
ical work and is measured by the tests of rapid addition and 
multiplication. 


Verbal. The verbal factor manifests itself in the completion 
and in same or opposite tests. It is roughly the ability to deal 
readily and quickly with verbal materials. 


Spatial. The spatial factor is measured by tests requiring the 
subject to think visually of geometric forms and of objects in 
space. 


Memory. This factor is one’s ability to memorize various mate- 
rials. One test requiring the memorization of initials with names, 
and a second test requiring the association of words with num- 
bers are used in measuring the ability. 


Inductive Reasoning. The induction factor may be described 
as one’s ability to discover some rule or principle in various 
arrangements of material. A numerical, a verbal, and a spatial test 
are used in estimating the ability. 
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Deductive Reasoning. The deductive factor may be described 
as facility in formal reasoning. It is measured by tests of arith- 
metic problems, number series, and perception of mechanical 
movements. 


The largest correlation between a composite primary ability 
score and average grades is for the verbal composite (.415) which 
requires 16 minutes of testing time. The remaining correlations are 
much lower. These results are reasonable, since the composite 
scores represent specific abilities, and average grades represent 
general academic proficiency. 


Test Scores and Course Grades 


Table II reports the correlation between various freshman week 
tests and grades for the four general courses. The two largest cor- 
relations (.654 and .648) are for the physical and social sciences 
aptitude tests with the respective general courses. The correlations 
for the American Council Psychological and the CEEB Scholastic 
Aptitude Examination show no statistically significant difference 
for the biological sciences, social. sciences, and physical sciences 
general course examinations. However, the correlation between the 
CEEB and humanities is significantly greater than between hu- 
manities and the American Council Psychological Examination. It 
is of interest to note the variations in the size of the correlation 
coefficients for the composite scores of the Primary Mental Abilities 
battery. The two largest correlations for the composite scores are 
between Deduction and grades in the physical sciences, and between 
Verbal and grades in the humanities. One might very well expect 
these phenomena. At the same time, humanities shows a correlation 
of only .071 with the composite Spatial score. 


The degree of independence of the seven composite scores of the 
Primary Mental Abilities battery is reported in Table III, which 
gives the intercorrelations for the seven scores. Slightly over half 
of the correlations in the table are less than .300 and two can be 
considered as zero. These small correlations suggest a considerable 
degree of independence for these scores and that they might well 
measure specific abilities. The intercorrelations among Spatial, In- 
duction, and Deduction scores are all very near .500 and thus give 
evidence of considerable dependence of scores. 

The first line of Table IV reports the multiple correlation co- 
efficients for the combination of the two primary ability composites 
having the highest validities with respect to each of the four general 
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courses. The Verbal and Deduction scores, from tests requiring 70 
minutes, were combined for all except the humanities course. For 
this course the Verbal and Number scores, from tests requiring 35 
minutes, were combined. The second line of Table IV reports the 
multiple correlations obtained by combining all seven scores of the 
Primary Mental Abilities Tests. These coefficiénts are not markedly 
higher than those obtained from the best two in each case. 


TABLE I 


Testing Time Required for Administering Various Psychological and 
Placement Tests to the 1938 Freshman Class at the University of Chicago 
and the Correlation of these Tests with Average Grades. 





























Testing | Correlation Testing | Correlation 
Test Time in with Test Timein | with 
Minutes Average Minutes Average 
Grades Grades 
Perception* 20 117 American Council Psy- 
chological Examination 56 -523 
Number* 19 310 College Entrance Examina- 
tion Board’s Scholastic 
Aptitude 120 542 
Verbal* 16 415 Physical Sciences Aptitude 60 522 
Spatial* 33 -184 Social Sciences Aptitude 60 575 
Memory* 33 -204 Pressey’s Special Reading 60 477 
Induction* 46 -229 Pressey’s Reading Compre- 
hension T .326 
Deduction* 54 .378 Vocabulary 25 -486 





* Composite scores for the Thurstone.,Tests for Primary Mental 
Abilities. 

+ Specific time limits are not given; the students are given the time 
necessary to read entire reading selection and answer the questions. Ap- 
proximately 20 minutes are required. 


Coefficients of Correlation between Various Psychological and Place- _ 
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TABLE II 


ment Tests and the Four Introductory General Courses at the University 


of Chicago. 























fl 
Biological | Humani- | Physical Social 
Test Sciences ties Sciences | Sciences 

SS ie eae are cere -084 .129 -166 135 
NN olan, oN a es ees .207 -265 .272 300 
| SESS ene an Aatee Aa HP ear .380 472 .376 435 
SE ENE eae ty n otee eT en 225 -071 -139 131 
Memory*........ -145 127 177 -160 
Induction*........... pais x9 sso -029 -247 -196 
NEN cry al Sa ois | .418 .190 485 -427 
American Council Psychological 

ee Ee aE .483 -485 -482 569 
College Entrance Examination 

Board’s Scholastic Aptitude 

SRR a ict aer En Pere |  .479 -544 471 577 
Physical Sciences Aptitude........ ~~ — -654 —_ 
Social Sciences Aptitude.......... = dee — .648 





* Composite scores for the 
Abilities. 


TABLE III 


Thurstone Tests for Primary Mental 


Intercorrelations for the Seven Composite Scores for the Thurstone 
Tests for Primary Mental Abilities 




















ee 
a2 S| | 
oe ee 
. is ot a | 
Perception | 237 371| —_.392! 
| Number .250 .204 | 
| | Verbal! 218) 
| | Spatial 
| 





s 

Pp | 2 
3 | 2 
E 5 
® | As) 
= | 5 
046 355 
188) .306 
156, .347 
077) 490) 
Memory| .170 





Induction 





| Deduction 











re — ———— 


a 





ge, gy 











a 


EE 





ee tees an, of 








A STUDY OF FRESHMAN WEEK TESTS 
TABLE IV 


Multiple Correlation Coefficients between Various Combinations of the 
Composite Scores of the Primary Mental Abilities Tests and the Four 
Introductory General Courses. 








Combination of Biological | Humani- | Physical Social 
Composite Scores Sciences ties Sciences | Sciences 





Two Best-Predicting Composite 

















aaa Oy Se 484 -496 .529 -521 
All Seven Composite Scores........ -500 541 -561 .556 
Conclusion 


Two rather striking observations may be made on the basis of 
the results reported: 


(1) Marks in the four courses can be predicted by combining 
two fairly short primary abilities measures about as well as 
by using the one-hour American Council Psychological Ex- 
amination or the two-hour scholastic aptitude test of the 
College Entrance Examination Board, both of which were 
constructed for the purpose of predicting scholarship. This 
result is the more remarkable since the Primary Abilities 
Tests were not specifically constructed for the purpose of 
predicting grades. 


(2) Tests developed for the specific situation are in the present 
case more efficient prognostic measures than any other 
single test or combination of tests studied. The validities 
of the aptitude tests for the physical sciences and the social 
sciences are significantly higher than the other validities 
obtained. 


These two results appear to be essentially contradictory. One 
of them argues for the development of a number of relatively inde- 
pendent measures and the use of those which, in combination, are 
most efficient for the prediction of any selected criterion or group 
of criteria. The other seems to indicate that tests constructed and 
revised in the light of analysis with respect to the local situation 
are most effective, at least when compared with two of the better 
scholastic aptitude tests constructed for general use. This conclu- 
sion is valid for the tests studied in their present state of develop- 
ment. However, it is apparent that what can be measured in com- 
posite tests, such as the aptitude tests in the fields of the social and 


91 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


physical sciences, can be measured by a number of more specific 
. and relatively independent measures. The difference between the 
predictive efficiency of the primary abilities measures used as com- 
pared with the specific aptitude tests must be attributed to the fact 
that the former do not sample some of the attributes included in 
the latter. The fact that the Primary Mental Abilities Tests in 
combination produce fairly high validities although they were not 
developed for the purpose of predicting scholarship is indicative of 
the promise in this type of measurement. As the experimental tests 
of primary abilities are perfected and expanded to include other 
abilities involved in scholastic success, it is reasonable to expect 
combinations of them to approach and equal the validities of tests 
constructed for each of a number of specific situations. This devel- 
opment will make practical a much more efficient use of test mate- 
rial when a number of criteria are to be predicted. 
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NOTE ON A SIMPLIFIED METHOD OF COMPUTING 
TEST RELIABILITY 


Cc. J. HOYT 
University of Minnesota 


Kuder and Richardson’ have presented the theoretical back- 
ground as well as useful formulas for a new and improved proce- 
dure for estimating the coefficient of test reliability. In a later 
paper’ they have labeled their procedure “the method of rational 
equivalence.” Their results appear to have a number of important 
advantages over the split-half correlation method used in con- 
junction with the Spearman-Brown formula. With the split-half 
method the obtained coefficient may be an overestimate or an 
underestimate of the actual reliability. With the method of 
rational equivalence the estimate derived is known to be never 
an overestimate.* This fact alone is sufficient for recommending 
the displacement of the split-half procedure, although there are 
other advantages, as pointed out below. 


The theoretical soundness of the Kuder-Richardson derivation 
is indicated by the fact that analysis of variance techniques 
applied to this problem produce an identical formula. The present 
writer’s derivation, using an approach entirely different from that 
used by Kuder and Richardson, will appear elsewhere. 


The use of the formula recommended by the authors for gen- 
eral use requires only the same primary data as are ordinarily 
obtained in a careful analysis of a test. Consequently, it is not 
necessary to obtain the scores on separate parts of the test. The 
possibility of obtaining varying results with different methods 
of dividing the test is also obviated. The computations involved 


1G. F. Kuder and M. W. Richardson. “The Theory of the Estimation of 
Test Reliability.” Psychometrika, II (1937), 151-60. 


7M. W. Richardson and G. F. Kuder. “The Calculation of Test Re- 
liability Coefficients Based on the Method of Rational Equivalence.” 
Journal of Educational Psychology, XL (1939), 681-87. 


*This statement is strictly true for the population used. Sampling 
errors are, of course, not eliminated. For a discussion of sampling errors 
the reader is referred to Robert W. Jackson, “Reliability of Mental Tests.” 
British Journal of Psychology, XXIX (1939), 267-87. 
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in computing the coefficient of reliability by the method of 
rational equivalence can be performed in a few simple steps that 
do not require any special statistical knowledge. 


The increasing use of the method of rational equivalence for 
the estimation of test reliability leads the writer to describe a 
procedure which he has found to be particularly efficient. Al- 
though Kuder and Richardson present a number of formulas 
involving various degrees of rigor, they recommend their formula 
(20) for general use. Their empirical findings and those of a num- 
ber of others who have been using the method indicate that the 
results obtained from their formula (20) closely approximate 
those obtained by the more rigorous formulas. The steps outlined 
below have therefore been developed for a variant of the recom- 
mended formula.* 


1. Score the tests for the number of right answers. Obtain 
the sum of these scores for all the subjects. This value 
is T in formula (1) below. 

2. Square each of these scores and obtain the sum of these 
squares for all the subjects. This sum is Ss in the for- 
mula below. 

3. Make a tally of the test responses to each item and 
obtain the count of the number correct for each item. 
The total of these counts should equal the T obtained 
in step 1. } 

4. Square the count obtained for each item and obtain the 
sum of these squares for all the items. This sum is Si 
in the formula below. 

5. Using the values obtained in the steps above, solve the 
following formula for ry, the reliability of the test. In 
this formula, k is the number of subjects taking the test 
and n is the number of items in the test. 


n kSs+Si—T(T +k) 
'y.--—- —.. (1) 
n-1 kSs — T? 





In the use and analysis of a test some of these steps will 
already have been performed. Use of the item counter of the 
International Scoring Machine will greatly facilitate step 3. If 


“While the present paper was in press, Paul L. Dressel published other 
variants of the Kuder-Richardson formulas. Formula (1) above is equiv- 
alent to formula (4) of Dressel’s Paper. Paul L. Dressel, “Some Remarks 
on the Kuder-Richardson Reliability Coefficient.” Psychometrika, V 
(1940), 305-10. 
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A METHOD OF COMPUTING TEST RELIABILITY 


a computing machine is not available, steps (2) and (4) can be 
greatly facilitated by the use of a table of squares and an or- 
dinary adding machine. 


The use of formula (1) will be illustrated in a particular ex- 
ample involving a test of 250 items administered to a group of 
33 students in the College of Pharmacy at the University of Min- 
nesota. The values obtained in this case were as follows: 





Si = 112,873 T = 4829 Ss = 727,351 
r, 250. 33(727,351) + 112,873 — 4829(4829 + 33) 
249 33(727,351) — (4829)° 


250 636,858 1 159,214,500 
249 683,342 ‘170,152,158 
Formula (1) is algebraically equivalent to formula (20) pre- 


sented by Kuder and Richardson. Their formula (20) is as fol- 
lows: 





= .936 





n «?—npq 





ri = 
n-1 of? 
"/ 
n o?—Lpiqi 
= . ’ 
n-I1 oe? 


where o is the standard deviation of the distribution of test 
scores, p: is the proportion of students passing each item taken 


in turn, and q: is the proportion failing that item. 


It should be remembered that this procedure is no more ap- 
plicable to speed tests than is the Spearman-Brown formula. 
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MEASUREMENT ABSTRACTS 


Bedell, Ralph. “Scoring Weighted Multiple Keyed Tests on the 
IBM Counting Sorter.” Psychometrika, V (1940), 195-201. 


Tests or personal inventories with differential item response 
weights may be scored by means of punch card equipment. De- 
tailed instructions are given for preparing the cards and scoring 
the forms. The scoring speed is approximately four to eight times 
that attained by manual scoring. (Courtesy Psychometrika.) 





Blakey, Robert. “A Re-Analysis of a Test of the Theory of Two 
Factors.” Psychometrika, V (1940), 121-36. 


The study of William Brown and William Stephenson, “A 
Test of the Theory of Two Factors,” is re-analyzed by means of 
the Thurstone multiple factor methods. No tests or correlations 
are left out of the original table of correlations as is done in the 
original analysis in an attempt to validate the two-factor theory. 
Space, verbal, and perceptual speed factors similar to those found 
by Thurstone, Wright, and Garrett are identified. A common 
factor of “Maturation” is postulated to account for the remain- 
ing communality of the tests. A fifth factor is considered to 
have no significance due to the small amount of variance which 
it contributes to the total. (Courtesy Psychometrika.) 





Blum, M. L. “A Contribution to Manual Aptitude Measurement 
in Industry: the Value of Certain Dexterity Measures for the 
Selection of Workers in a Watch Factory.” Journal of Ap- 
plied Psychology, XXIV (1940), 381-416. 


Job analysis of watch assembling suggested the importance of 
the ability to make fine finger movements, the ability to handle 
tweezers, and the ability to continue to perform delicate tasks 
without increasing tension or maladjustment. Three criteria of 
proficiency were established: length of employment, salary ratio, 
and foremen’s ratings. Two hundred and fifty-eight women (37 
workers, 137 applicants before being hired, 84 applicants after 
being hired) were examined with the O’Connor Finger Dexterity 
and Tweezer Dexterity tests. Time scores showed the highest 
prediction of the proficiency criteria. The practical value of crit- 
ical time scores on the dexterity tests was indicated. W. A. 
Varvel. 
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MEASUREMENT ABSTRACTS 


Cattell, R. B. “A Culture-Free Intelligence Test.” Part I. Journal 
of Educational Psychology, XXXI (1940), 161-79. 


A common source of error in the Binet-Simon type of test 
arises from the influence of academic experience and general 
cultural background. Instead of sampling the “common knowl- 
edge” of the subject, the test emphasizes the perception of rela- 
tions inherent in objects and processes common to a wide range 
of cultural groups. One hundred multiple choice scaled items are 
chosen from mazes, series, classifications, progressive matrices 
(3 types), and mirror images. The progressive matrix test con- 
sists of combined analogy and progressive series items. Harold 
Bechtoldt. 





Dunlap, Jack W. “Problems Arising from the Use of a Separate 
Answer Sheet.” Journal of Psychology, X (1940), 3-48. 


The use of a separate answer sheet has been considered in 
terms of validity and reliability of the more conventional type 
of response. Underlining, marking parentheses, marking sep- 
arate answer sheets using serial and repetitive numbering for 
choices with the answer sheets of both articulated and non- 
articulated types lead to practically identical results. Com- 
parisons were made in terms of means, standard deviations, relia- 
bilities, and validity of test results for both fourth- and eighth- 
grade pupils. The use of an articulated, serial numbered answer 
sheet is recommended for tests short enough for all answers to be 
recorded on a single side of the sheet. Harold Bechtoldt. 





Harrell, Willard. “A Factor Analysis of Mechanical Ability 
Tests.” Psychometrika, V (1940), 17-33. 


The intercorrelations of 37 variables, including the Minnesota 
battery of “mechanical ability” tests, the seven MacQuarrie tests 
of “mechanical ability,” O’Connor’s Wiggly blocks, and the Sten- 
quist picture-matching test, were analyzed by Thurstone’s cen- 
troid method. Five factors, Perceptual, Verbal, Youth, Manual 
Agility, and Spatial, were taken out. Factors prominent in so- 
called mechanical ability tests are the Spatial and Perceptual 
ones with MacQuarrie’s dotting test significantly high in the 
Manual Agility factor. Each of the factors can be measured with 
group pencil-and-paper tests. (Courtesy Psychometrika.) 





Harrell, T. W. and Faubion, R. W. “Selection Tests for Aviation 
Mechanics.” Journal of Consulting Psychology, IV (1940), 
104-05. 


Students of the United States Army Air Corps Technical 
Schools take a basic course of Shop Mathematics, Mechanical 
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Drafting and Blueprint Reading, Air Corps Fundamentals, 
Metal Work, and Electricity besides specializing in some field. 
Correlations of 38 tests with these five basic courses range from 
—.20 to +.54. Four tests give a multiple correlation of .72 with 
a composite basic grade. A factor analysis is being made of 24 
of these variables. Harold Bechtoldt. 





Johnson, A. P. “A Study of One Company’s Criteria for Selecting 
College Graduates.” Journal of Applied Psychology, XXIV 
(1940), 253-64. 


A company had for some years considered applications for 
sales positions (personnel, advertising, and sales promotion) on 
the basis of an intelligence test, a vocabulary test, and ratings on 
family background, industriousness, extroversion-introversion, 
and flair for writing. The present study examines the data for 80 
applicants (41 hired, 39 rejected) and seeks to objectify the rat- 
ings and to establish estimates of their reliability and validity. 
The combined ratings of six members of a class in industrial psy- 
chology showed satisfactory reliability. Ratings on “writing 
flair” most markedly differentiated the hired from the rejected. 
Ratings on “family background” showed the highest correlation 
(+0.40 + 0.12) with service or merit ratings made by five com- 
pany executives on 23 workers. W. A. Varvel. 





McCloy, C. H. “The Measurement of Speed in Motor Perform- 
ance.” Psychometrika, V (1940), 173-82. 


When the centroid method of factor analysis was applied to 
two sets of data on athletic performances, three significant fac- 
tors emerged: strength, velocity, and dead weight. Scores on this 
speed factor were predicted by the multiple regression technique, 
the factor loadings on the speed factor being used as the criterion 
correlations, and these predicted scores were correlated with each 
of the other variables. When the original tables, augmented by 
the new speed variable, were refactored, the computed speed fac- 
tor fell on the speed axis as a primary trait. It is thus shown 
that it is possible to isolate and measure a factor which appears 
in variables under consideration only as a compound. (Courtesy 
Psychometrika.) 





Palmer, C. E., and Klein, H. “A Table of the Double Integral of 
the Gaussian Probability Function.” Child Development, XI 
(1940), 61-8. F. A. Kingsbury. 
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Roslow, S., Wulfeck, W. H., and Corby, P. G. “Consumer and 
Opinion Research: Experimental Studies on the Form of the 
Question.” Journal of Applied Psychology, XXIV (1940), 
334-46. 


Summaries of the results of eight studies on varying the form 
of questions are given. Alternate forms of the questionnaire, 
successive forms one month apart, and free response questions 
were among the methods used. The use of stereotypes or emo- 
tionally charged words produced significant changes in responses. 
Slight changes in wording may or may not result in changes in 
frequencies of the response choices. The completeness and num- 
ber of alternatives offered in check lists tend to influence the 
proportions for any one response, while the results from free- 
response questions may be definitely misleading. Harold Bech- 
toldt. 





Sarbin, T. R., and Berdie, R. F. “Relation of Measured Interests 
to the Allport-Vernon Study of Values.” Journal of Applied 
Psychology, XXIV (1940), 287-96. 


Fifty-two university students were given the Allport-Vernon 
Scale and the Strong Vocational Interest Blank, Form M. A 
modification of the pattern analysis described by Darley was 
applied to the Strong profiles. Occupational keys were grouped 
according to the results of factor analysis studies. “A few of the 
occupational groups showing measured interest patterns are char- 
acterized by certain profiles on the Allport-Vernon Scale.” Al- 
though there is considerable overlapping between groups, “it is 
possible, nevertheless, . . . to use the Allport-Vernon Scale to 
approximate certain occupational interest types as measured by 
Strong. Thus, a definite but limited use is demonstrated for the 
Allport-Vernon scores when it is desirable to distinguish or 
identify vocational interest types in the professional, sales, or 
‘uplift’ occupations.” W. A. Varvel. 





Schultz, R. S. “Preliminary Study of an Industrial Revision of 
the Revised Minnesota Paper Form Board Test.” Journal of 
Applied Psychology, XXIV (1940), 463-67. 


The Likert-Quasha Revised Minnesota Paper Form Board 
Test (Form AA) was further revised for industrial use in order 
to decrease the demand on verbal comprehension of the instruc- 
tions and to simplify the response required. A preliminary study 
of this industrial revision is reported. Correlations with the 
Revised Minnesota range from -+-.71 to -+-.86. Scores on the indus- 
trial revision tend to be significantly higher. Twenty-one engi- 
neering students obtained a higher average score than did 42 
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trade-school boys and 57 high-school girls. Correlations with 
intelligence correspond with those found in previous studies with 
the Paper Form Board Test. W. A. Varvel. 





Seder, M. “The Vocational Interests of Professional Women.” 
Part II. Journal of Applied Psychelogy, XXIV (1940), 265-72. 


Sixty women physicians and 69 life insurance saleswomen 
filled out both the men’s form and the women’s form of the 
Strong Vocational Interest Blank. For the 268 items common to 
the two forms, the median number of discrepant responses was 
18 per cent, so that the test-retest reliability is considered satis- 
factory. The common items are as heavily or more heavily 
weighted than items occurring on only one blank. In general 
there is substantial agreement between the weights assigned to 
the response to each item by the men’s key and by the women’s 
key for the same occupation. “All indications of this study are 
that differences between sexes in an occupation are usually less 
frequent and less important than similarities.” It is suggested 
that a common blank should be composed and that where sex 
differences actually appear an occupational key for each sex 
should be constructed. W. A. Varvel. 





Thurstone, L. L. “Experimental Study of Simple Structure.” Psy- 
chometrika, V (1940), 153-68. 


A battery of 36 tests was given to a group of high-school 
seniors. The factorial analysis reveals essentially the same pri- 
mary factors that were found in previous studies. The test bat- 
tery reveals a simple structure. (Courtesy Psychometrika. ) 





Tucker, Ledyard R. “The Role of Correlated Factors in Factor 
Analysis.” Psychometrika, V (1940), 141-52. 


The fundamental factor theorem is developed in matrix form 
for the case of correlated factors. The properties of the corre- 
lated factor system are discussed, and some effects of sampling 
error considered. The psychological meaning of correlated fac- 
tors is discussed, and several mechanisms by which general fac- 
tors may operate in the factorial system are indicated. (Courtesy 
Psychometrika. ) 





Walker, Helen M. “Degrees of Freedom.” Journal of Educational 
Psychology, XXXI (1940), 253-69. 


The number of degrees of freedom is a basic concept in small 
sample theory. Most textbooks omit a discussion of this topic, 
and many texts give incorrect formulae and procedures because 
of ignoring it. The development starts with the freedom of move- 
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ment of a point in space under certain restraining conditions and 
utilizes the representation of a statistical sample by a single point 
in N-dimensional space. Illustrations are presented showing how 
to determine the number of degrees of freedom appropriate for 
use in certain common situations, as standard error of the mean, 
Chi-square test, contingency tables, partial correlation, and anal- 
ysis of variance formulae. Harold Bechtoldt. 





Young, P. V. “The Validity of Schedules and Questionnaires.” 
Journal of Educational Sociology, XIV (1940), 22-6. 


A brief summary is given of an experiment with a variety of 
questionnaires and schedules as used on three considerably homo- 
geneous communities. Objective, quantitative data were difficult 
to obtain. The data shed little light on complexities of social 
patterns or on behavior patterns of cultural worlds in relation to 
social life and personality adjustment. A review is given of some 
problems involved in the construction of such instruments and of 
circumstances when they can most advantageously be used. 


Calvin Taylor. 
* ke 


MEASUREMENT NEWS* 


A Personnel Research Section has recently been established 
in the War Department under the Adjutant General. The func- 
tion of the section is to devise and assemble procedures for the 
classification of military personnel. Dr. W. V. Bingham is the 
director of this section. Among the professional members of the 
staff are Dr. T. W. Harrell, on leave from the University of IIli- 
nois; Mr. W. M. Shanner, on leave from the University of Chi- 
cago; and Dr. Willis Schaefer, formerly of the University of 
Chicago. This section has the technical advice of a National Re- 
search Council committee on the Classification of Military Per- 
sonnel. Members of the committee are: 


Drs. Walter V. Bingham; Carl C. Brigham, Princeton Univer- 
sity; Henry E. Garrett, Columbia University; L. J. O’Rourke, 
United States Civil Service Commission; Marion W. Richardson, 
United States Civil Service Commission; Carroll L. Shartle, So- 
cial Security Board; and L. L. Thurstone, University of Chicago. 





As a part of the national defense program the Occupational 
Analysis Section of the United States Bureau of Employment 
Security is making job analyses of occupations in the United 


* Notes for this department should be sent to Dr. M. W. Richardson, 
United States Civil Service Commission, Washington, D. C. 
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States Army. Over seven thousand analyses will be made and 
job specifications will be prepared to aid the Army in making its 
assignments of personnel. The Army is also using the Oral Trade 
Questions which have been developed by the Employment Serv- 
ice, as well as the recently published “Dictionary of Occupa- 
tional Titles.” New aptitude tests developed by the Occupational 
Analysis section are being released to both the Army and the 
Navy. 

Another field of activities of the Occupational Analysis Sec- 
tion is that of assisting local employment offices to select rapid 
learners for defense jobs. New aptitude tests are being developed 
for this task. New trade tests are also being constructed for 
defense jobs requiring highly skilled workers. The greatest at- 
tention is being given to those jobs which are important both to 
the armed forces and to the civilian defense industries. 

The Occupational Analysis Section is under the supervision 


of Dr. C. L. Shartle. 





The Washington Psychometric Society was organized Novem- 
ber 13, 1940, with a charter membership of eleven. The following 
officers were elected: M. W. Richardson, president; N. J. Van 
Steenberg, secretary; C. R. Brolyer, treasurer. It is planned to 
hold meetings once a month. 





Machine methods as applied to the field of measurement 
formed the major subject of discussion at an “Educational Re- 
search Forum” held at the Homestead of the International Busi- 
ness Machines Corporation at Endicott, New York, during the 
week of August 26 to 31. A limited number of transcripts of the 
proceedings are available to those interested. Requests for tran- 
scripts should be sent to Mr. E. C. Schroedel, Manager, Institu- 
tional Department, International Business Machines Corporation, 
590 Madison Avenue, New York, New York. 

The papers presented at the Forum are listed below. 
Computation of Statistical Constants: 

“The Value of the Collator in Using Prepunched Cards for 

Obtaining Moments and Product Moments.”— Alan D. 

Meacham 

“The Computation of Means, Standard .-viations and Corre- 

lations by Use of the Tabulator When the .. ‘mbers are Either 

Positive or Negative.”—Jack W. Dunlap 

“Summary of Problems in Computation of Statistical Con- 

stants.”—Paul S. Dwyer 
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“The Design of Tabulating Procedures in Relation to Auto- 
matic Error Control in Statistical Analysis.”"—Charles R. 
Langmuir 

“Code Numbers and Coding as Aids to Research.”—Herbert A. 
Toops 


Classification and Prediction: 
“Four Aspects of Factor Analysis. A Problem for Which Ma- 
chine Procedures are Needed.”—Harry H. Harman 
“Use of Tabulating and Scoring Machines in Factor Anal- 
ysis."—Ledyard R. Tucker 
“Canonicals.”—Irving Lorge 
“A Successive Approximation Solution for Prediction Prob- 
lems Involving a Large Number of Variables.”—John C. 
Flanagan 
“Problems of Classification of Personnel in the Army.”’— 
Truman L. Kelley 
“Army Testing Problems.”—T. W. Harrell 


Test Construction: 
“Computing Difficulty Index and Validity Index in Item Anal- 
ysis by IBM Machines.”—John M. Stalnaker 
“Item Analysis by Test Scoring Machine Graphic Item Coun- 
ter.”—John C. Flanagan 
“Repetitive Scoring of Interest and Personality Tests in De- 
veloping Item Weights by an Iterative Process.”—Robert T. 
Rock, Jr. 


Testing Programs: 
“The Integration of the Test Scoring Machine with Tabulat- 
ing Equipment in a System of Progess Tests and Compre- 
hensive Examinations.”—J. V. McQuitty 
“Applications of Electric Accounting Machines in Reporting 
Individual and Group Results in a Testing Program.”— 
Charles R. Langmuir 
“The Facilitation of the Analysis onl Distributien of College 
Entrance Test Data in a Statewide Testing Program.”—E. L. 
Stromberg 





In the spring of 1939, at the request of a number of school 
teachers and administrators throughout the United States, the 
American Council on Education appointed the National Commit- 
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tee on Teacher Examinations, and authorized it to supervise and 
delegate to the Cooperative Test Service of the American Council 
the task of preparing a battery of objective tests for the examina- 
tion of teaching candidates. The National Teacher Examinations 
were administered for the first time in various centers throughout 
the United States on March 29-30, 1940. 


New editions of the Teacher Examinations are being prepared 
for administration in 1941. The tests cover such areas as under- 
standing and use of the English language; reasoning ability; 
knowledge of contemporary affairs; general cultural information; 
understanding of professional educational points of view, goals, 
attitudes, and methods; and mastery of subject matter to be 
taught. All examinations are objective, consisting of short 
answer items involving multiple choice response. In 1941 the 
‘National Teacher Examinations will be administered two full 
days. Approximately twelve hours of testing time are required 
for the examinations. 


The dates which have been named by the National Committee 
for the administration of the Teacher Examinations in 1941 are 
March 14 and 15. : 


The examinations have, of necessity, been limited to intellec- 
tual, academic, and cultural materials. Other important factors 
that determine teaching success, such as training, experience, 
personality characteristics, social adaptability, and others are 
judged independently by the local authority to whom the candi- 
date applies. 





A revised series of Cooperative General Achievement Tests 
were introduced this fall by the Cooperative Test Service. The 
revised series beginning with Form QR includes Test I: A Test 
of General Proficiency in the Field of Social Studies; Test II: 
A Test of General Proficiency in the Field of Natural Sciences; 
and Test III: A Test of General Proficiency in the Field of 
Mathematics. These general proficiency tests are not composed 
of questions dealing with topical content of the fields covered. 
Instead, each test is divided into two parts: the first, testing for 
knowledge of the terms and concepts essential to an understand- 
ing of the area in question; the second, testing the student’s abil- 
ity to comprehend and interpret typical materials in the fields. 
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